Skip to content

Implement table function source#6#8

Open
xin-zhang2 wants to merge 21 commits into
tvf_14175from
tvf_14566
Open

Implement table function source#6#8
xin-zhang2 wants to merge 21 commits into
tvf_14175from
tvf_14566

Conversation

@xin-zhang2
Copy link
Copy Markdown
Collaborator

@xin-zhang2 xin-zhang2 commented Mar 24, 2025

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* ... 
* ... 

Hive Connector Changes
* ... 
* ... 

If release note is NOT required, use:

== NO RELEASE NOTE ==

@mohsaka
Copy link
Copy Markdown
Owner

mohsaka commented Mar 25, 2025

Currently on updating the TableFunctionMatcher for commit
05535286323470132dd5df0e88ee1e6aa49f4c34
There is an interesting situation with Symbol <-> VariableReferenceExpression that occurs

Expected Trino Output from here

                Set<Reference> expectedPassThrough = expectedTableArgument.passThroughVariables().stream()
                        .map(symbolAliases::get)
                        .collect(toImmutableSet());
                Set<Reference> actualPassThrough = argumentProperties.passThroughSpecification().columns().stream()
                        .map(PassThroughColumn::symbol)
                        .map(Symbol::toSymbolReference)
                        .collect(toImmutableSet());
expectedPassThrough = {SingletonImmutableSet@11437}  size = 1
 0 = {Reference@11479} "field::smallint"
  type = {SmallintType@11481} "smallint"
  name = "field"
actualPassThrough = {SingletonImmutableSet@11438}  size = 1
 0 = {Reference@11485} "field::smallint"
  type = {SmallintType@11481} "smallint"
  name = "field"

It seems like SymbolReference in Trino is == VariableReferenceExpression.

@mohsaka mohsaka force-pushed the tvf_14566 branch 3 times, most recently from 37bf5dc to 542b093 Compare March 25, 2025 02:58
mohsaka and others added 4 commits March 25, 2025 09:30
The new field allows the table function to declare during
analysis which columns from the input tables are necessary to
execute the function.

The required columns can be then validated by the analyzer.
This declaration can be also used by the optimizer to prune
any input columns that are not used by the table function.

Changes adapted from trino/PR#15256
Original commit: c77995bde37f79c44c9d74336f57b32a2b142570
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Changes adapted from trino/PR#15256
Original commit: 0aaab67639b07cddc1ac7f3af6cc54af6a4de4c0
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Change the builder for TableArgumentSpecification so that
there is no default for the empty behavior.

Changes adapted from trino/PR#14566
Original commit: 6e95fb186fc3edc7fd6a5bdc385e68f8accff11e
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
…tput

If a table function has multiple partitioned inputs, they might be
co-partitioned. Co-partitioning might require coercing of the
corresponding partitioning columns to common supertype.

Additionally, each partitioning source column should be produced
on table function's output in its original (uncoerced) form.

Before this change, the coerced columns were incorrectly passed to
output. After this change, the uncoerced columns are preserved for
output independently from the Specification, which is planned with
regard to the required co-partitioning coercions.

Changes adapted from trino/PR#14566
Original commit: 05535286323470132dd5df0e88ee1e6aa49f4c34
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
xin-zhang2 and others added 7 commits March 25, 2025 13:45
Adds the SchemaFunctionName, which can act as a unique
identifier of the function for a given catalog.

Changes adapted from trino/PR#14566
Original commit: cf0bd23dc457ba0cf6483e086cc342b071c32efb
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
the necessary pass-through information for a table function's source includes:
- whether the source was declared as pass-through
- an ordered list of pass-through columns
- for each column, information whether it is a partitioning column

Changes adapted from trino/PR#14566
Original commit: 795f46d93e3039c4872cfd2be842aeaa306f39f8
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
When the ordering scheme is mapped, some symbols might
be removed due to de-duplication. If de-duplication happens
within the pre-sorted prefix, the length of the prefix
should be updated.

This causes no issues currently, since UnaliasSymbolReferences,
and so the SymbolMapper, is never called after AddLocalExchanges,
where the pre-sorted symbols are determined.

Changes adapted from trino/PR#14566
Original commit: af620aaf426745e47f4a4d458cbc6b71d2684ddb
Author: kasiafi

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Changes adapted from trino/PR#14566
Original commit: 96be405333dd55ccd789aa5133799c492cf06dcf
Author: kasiafi
…ingNode

Changes adapted from trino/PR#14566
Original commit: 5d5eaf8e6b5737758589247fc106867482fb54d3
Author: kasiafi
Support arbitrary number of sources (including no sources),
involving row and set semantics, prune/keep when empty properties,
and co-partitioning.

Changes adapted from trino/PR#14566
Original commit: 68bbbca9cc32505badc343adab56f59a685a7187
Author: kasiafi
@mohsaka
Copy link
Copy Markdown
Owner

mohsaka commented Mar 26, 2025

Currently bringing in the last commit. Working on the ImplementTableFunctionSource file. Quite a large one. Feel free to pick it up if you fix the testing.

@mohsaka mohsaka force-pushed the tvf_14566 branch 3 times, most recently from 5d8a597 to 9b700e7 Compare March 27, 2025 02:20
@mohsaka
Copy link
Copy Markdown
Owner

mohsaka commented Mar 28, 2025

For one of the usages, we are able to use coalesce as we are passing in VariableReferenceExpressions. The others do not and are forms of Expressions.

I looked into SpecialFormExpressions but those require VariableReferenceExpressions as well.

Also attempted to bring in rowExpression from QueryPlanner/RelationPlanner, however we are missing a few of the objects required. They would need to be passed in from PlanOptimizers. But could not find a shared class between them to pass them all the way through.

@mohsaka
Copy link
Copy Markdown
Owner

mohsaka commented Mar 28, 2025

Current issue

Mismatch on ordering Type.

variable = {VariableReferenceExpression@6868} "d"
 name = "d"
 type = {UnknownType@6877} "unknown"
 sourceLocation = {Optional@5524} "Optional.empty"
that.variable = {VariableReferenceExpression@6874} "d"
 name = "d"
 type = {BigintType@6883} "bigint"
 sourceLocation = {Optional@5524} "Optional.empty"

Symbol Alias differences

Trino LATEST

symbolAliases = {SymbolAliases@10219} "SymbolAliases{{c=c::bigint, d=d::bigint}}"
 map = {RegularImmutableMap@10234}  size = 2
  "c" -> {Reference@10240} "c::bigint"
   key = "c"
    value = {byte[1]@10245} [99]
    coder = 0
    hash = 99
    hashIsZero = false
   value = {Reference@10240} "c::bigint"
    type = {BigintType@10246} "bigint"
     signature = {TypeSignature@10248} "bigint"
     javaType = {Class@10249} "long"
     valueBlockType = {Class@8145} "class io.trino.spi.block.LongArrayBlock"
    name = "c"
     value = {byte[1]@10245} [99]
     coder = 0
     hash = 99
     hashIsZero = false
  "d" -> {Reference@10241} "d::bigint"
   key = "d"
    value = {byte[1]@10131} [100]
    coder = 0
    hash = 100
    hashIsZero = false
   value = {Reference@10241} "d::bigint"
    type = {BigintType@10246} "bigint"
    name = "d"

Presto

symbolAliases = {SymbolAliases@5520} "SymbolAliases{{c="c", d="d"}}"
map = {RegularImmutableMap@6892}  size = 2
 "c" -> {SymbolReference@6938} ""c""
  key = "c"
  value = {SymbolReference@6938} ""c""
   name = "c"
    value = {char[1]@6946} [c]
    hash = 99
   location = {Optional@5524} "Optional.empty"
    value = null
 "d" -> {SymbolReference@6939} ""d""
  key = "d"
   value = {char[1]@6945} [d]
   hash = 100
  value = {SymbolReference@6939} ""d""
   name = "d"
    value = {char[1]@6945} [d]
    hash = 100
   location = {Optional@5524} "Optional.empty"
    value = null

Remembered Xin hit a similar issue. Found function matchSpecification and changed his custom function to this as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants