Skip to content

Fix DISJOIN emitting duplicate unaliased columns that break DataFusion #153

Description

@conradbzura

Description

DISJOIN(...) transpiles to a WITH __giql_dj_cuts AS (…) CTE whose three UNION branches build cut points. Branch 1 aliases all four columns (AS kc, ks, ke, pos); branches 2 and 3 alias none (src/giql/generators/base.py:393-405). When the target's end de-canonicalizes to the bare physical column t."end" (the default 0-based half-open case, where de-canonicalization is the identity), branch 2 projects t."end" at both position 3 (ke) and position 4 (the end cut). DuckDB tolerates duplicate UNION output names (it takes the names from branch 1), so it runs there; DataFusion rejects it at plan time: "Projections require unique expression names but the expression "t.end" at position 2 and position 3 have the same name." Reproduce by transpiling any DISJOIN query for a default (0-based half-open) table and executing the result on a DataFusion SessionContext. Surfaced by the #139 cross-target oracle and pinned as a strict gap test in PR #152; it parallels #101 (literal references in DISJOIN) and unblocks DISJOIN coverage for #143 / #145.

Expected behavior

DISJOIN transpiles to SQL that executes on DataFusion (and any engine enforcing unique projection names), returning the same rows as DuckDB — unblocking DISJOIN cross-target identity in the #139 oracle.

Root cause

The three-branch UNION in giqldisjoin_sql (src/giql/generators/base.py:393-405) aliases its columns only in branch 1; branches 2 and 3 are unaliased, so the ke column (t."end") and the end-cut expression (also t."end" in the identity case) collide on the output name. Alias all four columns in branches 2 and 3 to match branch 1 (AS kc, AS ks, AS ke, AS pos). The UNION output names already come from branch 1, so this is behaviour-preserving on DuckDB and makes each branch's projection internally unique for strict engines.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions