Skip to content

Analyze table function arguments #4#6

Open
mohsaka wants to merge 23 commits into
tvf_12910from
tvf_13602
Open

Analyze table function arguments #4#6
mohsaka wants to merge 23 commits into
tvf_12910from
tvf_13602

Conversation

@mohsaka
Copy link
Copy Markdown
Owner

@mohsaka mohsaka commented Mar 14, 2025

Changes adapted from trino/PR#13602

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* ... 
* ... 

Hive Connector Changes
* ... 
* ... 

If release note is NOT required, use:

== NO RELEASE NOTE ==

Changes adapted from trino/PR#13602
Original commit: 21695a1dfb33d5cb8bb195397d769222da5a6345
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
@mohsaka mohsaka changed the title Analyze table function arguments Analyze table function arguments #4 Mar 14, 2025
Changes adapted from trino/PR#13602
Original commit: 8be4a486a525056ef6b361618280ae0185d7211b
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
@mohsaka
Copy link
Copy Markdown
Owner Author

mohsaka commented Mar 14, 2025

Completed up to 31b2b79751944eed42eaa4538a52dbf65c0399a6

…uments of table functions

Changes adapted from trino/PR#13649
Changes adapted from trino/PR#13602
Original commit: a63cc9c429cea7849cf723598719812cc8e5636c
Original commit: 31b2b79751944eed42eaa4538a52dbf65c0399a6
Author: kasiafi

Modifications were made to adapt to Presto including:
Updated tests to match Presto format.

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
mohsaka and others added 5 commits March 17, 2025 10:13
Changes adapted from trino/PR#13602
Original commit: a93597ff72d98d577b32f5da6ef562ec85a7aad6
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Changes adapted from trino/PR#13602
Original commit: 21acef5d028565b919e3eae2df9784fbc2317ae9
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Prune when empty is enforced for a table argument with row semantics.
For a table argument with set semantocs, keep when empty is the default,
and it can be changed to prune when empty.

Changes adapted from trino/PR#13602
Original commit: 9771edb626ea47f2333988178107c7a2dc46ec6b
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Changes adapted from trino/PR#13602
Original commit: 15664c845f4958ecffbb0dca2596226ee899c12e
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
According to the SQL standard, the only properties
of a table argument used during analysis are: the row type,
partitioning, and ordering. Other information, like the
prune / keep when empty property, are not involved in the
analysis.

Changes adapted from trino/PR#13602
Original commit: 18e97d406ae89528de5d04c868587f3c6bc2f1c4
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
mohsaka and others added 6 commits March 17, 2025 13:04
Changes adapted from trino/PR#13602
Original commit: 866f83c965500bfc962113db7c65b4c8a7931d0b
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
This is a change of approach on resolving arguments
of table functions.
Before this change, the mapping of descriptor arguments
to table arguments had to be determined during query
analysis. It was too limiting: in that model, input table
references were limited to the specified descriptor fields.

After this change, any input fields can be referenced,
and the required fields (channels) are to be determined
dynamically during function compilation.

One advantage of determining the required fields early
was the possibility to prune any columns that were
not required by the table function. This should be replaced
by adding a way for a table function to declare required
fields during query optimization.

Changes adapted from trino/PR#13602
Original commit: cbde4d7168efddf1d56f906d3d1e50d542d2f851
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Changes adapted from trino/PR#13602
Original commit: dc3552f812be96211989b73d60f60a99ab943863
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
It should make using descriptor arguments easier for the
table function author, and allow to compare with
NULL_DESCRIPTOR by equality.

Changes adapted from trino/PR#13602
Original commit: 612ad9bc1a5a3c5b80d5e5aacf7d5f3529a11a19
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Changes adapted from trino/PR#13602
Original commit: 8568ed11b94130c2a76ad1cf99c04d91fd2a0eca
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Changes adapted from trino/PR#13602
Original commit: b7460da53871b9b6a77ed08c24afbee1d5f72dd9
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
@mohsaka
Copy link
Copy Markdown
Owner Author

mohsaka commented Mar 18, 2025

Latest update, almost finished the last commit. Needs to bring in the Statement Analyzer into AbstractAnalyzer test. Afterwards we can work test by test to fix the issues that occur.

@xin-zhang2
Copy link
Copy Markdown
Collaborator

Lastest update:
Adde the table functions in AbstractAnalyzerTest.
Fixed some failures in the tests.

The current failure is in testDescriptorArgument

java.lang.AssertionError: Expected error 'line 1:80: Unknown type: verybigint', but got 'verybigint'

	at org.testng.Assert.fail(Assert.java:98)
	at com.facebook.presto.sql.analyzer.AbstractAnalyzerTest.assertFails(AbstractAnalyzerTest.java:473)
	at com.facebook.presto.sql.analyzer.AbstractAnalyzerTest.assertFails(AbstractAnalyzerTest.java:425)
	at com.facebook.presto.sql.analyzer.TestAnalyzer.testDescriptorArgument(TestAnalyzer.java:2076)

Caused by: com.facebook.presto.sql.analyzer.SemanticException: verybigint
	at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.lambda$null$30(StatementAnalyzer.java:1594)
	at java.util.Optional.map(Optional.java:215)
	at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.lambda$null$31(StatementAnalyzer.java:1589)
	...

It is caused by a signature mismatch of SemanticException, and the code lies in

field.getType().map(type -> {
try {
return functionAndTypeResolver.getType(parseTypeSignature(type));
}
catch (IllegalArgumentException e) {
throw new SemanticException(TYPE_MISMATCH, type, "Unknown type: %s", type);
}

where type is a String in Presto but a DataType which extends Node in Trino.

The same issue also occurs intestCopartitioning

if (!referencedArguments.add(argument.getArgumentName())) {
// multiple references to argument in COPARTITION clause are implicitly prohibited by
// ISO/IEC TR REPORT 19075-7, p.33, Feature B203, “More than one copartition specification”
throw new SemanticException(INVALID_COPARTITIONING, name.getOriginalParts().get(0), "Multiple references to table argument: %s in COPARTITION clause", name);
}

where originalParts is a list of String in Presto but a list of Identifier in Trino.

In order to indicate the accurate location in the error message, I think we should choose the trino approach and refactor the type field in DescriptorField and originalParts in QualifiedName
@mohsaka What's your thought on this?

}

if (!e.getMessage().matches(message)) {
if (!e.getMessage().equals(message) && !e.getMessage().matches(message)) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohsaka
The following message contains special characters in regular expression that results in a failure in the matches, so I temporarily added the equals comparison.

throw new SemanticException(INVALID_FUNCTION_ARGUMENT, argument, "Invalid descriptor argument %s. Descriptors should be formatted as 'DESCRIPTOR(name [type], ...)'", (Object) argumentSpecification.getName());

@mohsaka
Copy link
Copy Markdown
Owner Author

mohsaka commented Mar 18, 2025

Lastest update: Adde the table functions in AbstractAnalyzerTest. Fixed some failures in the tests.

The current failure is in testDescriptorArgument

java.lang.AssertionError: Expected error 'line 1:80: Unknown type: verybigint', but got 'verybigint'

	at org.testng.Assert.fail(Assert.java:98)
	at com.facebook.presto.sql.analyzer.AbstractAnalyzerTest.assertFails(AbstractAnalyzerTest.java:473)
	at com.facebook.presto.sql.analyzer.AbstractAnalyzerTest.assertFails(AbstractAnalyzerTest.java:425)
	at com.facebook.presto.sql.analyzer.TestAnalyzer.testDescriptorArgument(TestAnalyzer.java:2076)

Caused by: com.facebook.presto.sql.analyzer.SemanticException: verybigint
	at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.lambda$null$30(StatementAnalyzer.java:1594)
	at java.util.Optional.map(Optional.java:215)
	at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.lambda$null$31(StatementAnalyzer.java:1589)
	...

It is caused by a signature mismatch of SemanticException, and the code lies in

field.getType().map(type -> {
try {
return functionAndTypeResolver.getType(parseTypeSignature(type));
}
catch (IllegalArgumentException e) {
throw new SemanticException(TYPE_MISMATCH, type, "Unknown type: %s", type);
}

where type is a String in Presto but a DataType which extends Node in Trino.

The same issue also occurs intestCopartitioning

if (!referencedArguments.add(argument.getArgumentName())) {
// multiple references to argument in COPARTITION clause are implicitly prohibited by
// ISO/IEC TR REPORT 19075-7, p.33, Feature B203, “More than one copartition specification”
throw new SemanticException(INVALID_COPARTITIONING, name.getOriginalParts().get(0), "Multiple references to table argument: %s in COPARTITION clause", name);
}

where originalParts is a list of String in Presto but a list of Identifier in Trino.
In order to indicate the accurate location in the error message, I think we should choose the trino approach and refactor the type field in DescriptorField and originalParts in QualifiedName @mohsaka What's your thought on this?

For this one, we should just replace the first type with field since in Presto all of our types are stored as String instead of a DataType object.

                                                        catch (IllegalArgumentException e) {
                                                            throw new SemanticException(TYPE_MISMATCH, field, "Unknown type: %s", type);
                                                        }

Made that change as well as a location change, and it looks good now. Great work!

We did think about refactoring to DataType object but we thought it would be too invasive for this project Hopefully we can avoid making this change.

Changes adapted from trino/PR#13602
Original commit: dbef4d9be37494967496230573ab400e54aab0d9
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Co-authored-by: Xin Zhang <desertsxin@gmail.com>
…alified Names

Modify error messages to use Types instead of DataType from Trino
Fix test cases to follow Presto syntax.

Co-authored-by: mohsaka <135669458+mohsaka@users.noreply.github.com>
Changes adapted from trino/PR#13653
Original commit: 4a0188ff8bab16bfd8f81449d12d7e7d96676f1c
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
@mohsaka
Copy link
Copy Markdown
Owner Author

mohsaka commented Mar 19, 2025

Update, started 13653 on the same PR since it is small. Will add 14115 to this PR as well since it is also small.

xin-zhang2 and others added 5 commits March 19, 2025 11:44
Changes adapted from trino/PR#13653
Original commit: 4827fa8d76ac51706b47c108cc25bdd3bc3b5767
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Changes adapted from trino/PR#13653
Original commit: 02f3583d231fd7a89a18f48a6bfbec44b981c829
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Comment out the MATCH_RECOGNIZE test cases.
Changes adapted from trino/PR#14115
Original commit: 6e95fb186fc3edc7fd6a5bdc385e68f8accff11e
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Comment out the MATCH_RECOGNIZE test cases.
@xin-zhang2
Copy link
Copy Markdown
Collaborator

Finished 13653 and 14115.

I made the changes to use MISSING_ATTRIBUTE instead of COLUMN_NOT_FOUND in the test cases, as the SemanticErrorCode for missingAttributeException differ between Presto and Trino.

Also comment out some tests in testTableFunctionInvocationContext as MATCH_RECOGNIZE(trinodb/trino#6550) is not supported in Presto.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants