Migrate DISJOIN to a registered expander and solve the star-REPLACE portability limitation — Closes #143#159
Draft
conradbzura wants to merge 4 commits into
Draft
Migrate DISJOIN to a registered expander and solve the star-REPLACE portability limitation — Closes #143#159conradbzura wants to merge 4 commits into
conradbzura wants to merge 4 commits into
Conversation
ExpanderRegistry only exposed clear() for resetting state, which forces an isolating test fixture to choose between leaving the registry empty afterward or knowing its prior contents. Once built-in expanders register themselves at import, clearing permanently drops them for the rest of the process. Add snapshot() to capture the current registrations as a fresh mapping and restore() to reinstall a captured baseline, giving fixtures and plugins a public save/restore seam that survives a clear() in the middle.
DISJOIN emitted its WITH-CTE subquery from the giqldisjoin_sql string emitter on BaseGIQLGenerator, an emit-time special case left by the operator-pass epic. Move it onto the ExpandOperators pass so a registered expander returns the subquery as a sqlglot AST and the active target's serializer renders it. Add the giql.expanders package, which auto-imports every submodule so each expander self-registers at import; wire that import into transpile so the registry is populated before the first transpile. Register expand_disjoin for the generic target as the portable fallback every target resolves to through the registry chain. Make the full-row passthrough capability-driven: a target that supports SELECT * REPLACE (DuckDB) keeps the REPLACE form, while a target without it (DataFusion, the generic baseline) emits the portable * EXCEPT projection plus the two recomputed interval columns. Flip GIQLDisjoin.GIQL_EXPAND on so the pass takes over, and delete the now-dead giqldisjoin_sql emitter along with its DISJOIN-only resolution and passthrough helpers. Update the tests to transpile for the executing engine's dialect, drive the expansion pass in the canonicalizer no-op checks, exercise both passthrough forms, and treat the import-time built-in registrations as the registry baseline rather than an empty registry.
The __giql_dj_cuts CTE built its cut positions from three UNION branches, but only the first branch aliased its four projected columns. In the default 0-based half-open identity case the end column de-canonicalizes to the bare physical column, so the unaliased branches projected t."end" alongside the end-cut expression under the same output name. DuckDB tolerated the duplicate; DataFusion rejected the projection for non-unique expression names, so DISJOIN could not run there. Alias all four columns in every branch. The output names still come from the first branch, so this is behaviour-preserving on DuckDB while making each branch internally unique for strict engines. Promote the oracle's previously expected-failure DISJOIN case to a real cross-target identity test now that the query runs and agrees on every target.
Add a non-canonical cross-target oracle case so the portable star-EXCEPT passthrough executes on DataFusion, plus an engine-free regression pinning the per-branch cuts-CTE aliases. Document the REPLACE-vs-EXCEPT column-order divergence, centralize the DISJOIN prefix in a constants module, parse with parse_one, type the expander node, and restore the dropped rationale comments. Apply the shared registry-docstring, restore-in-place, and auto-discovery fixes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Migrate DISJOIN from the emit-time
BaseGIQLGenerator.giqldisjoin_sqlstring special-case (left by epic #114) to a generic registered AST expander, and make the full-row passthrough capability-driven so non-canonical encodings produce portable SQL on engines that lackSELECT * REPLACE.Register one
expand_disjoinagainstGenericTarget, so every target resolves to it through the registry's generic chain. The expander assembles the same__giql_dj_*WITH-CTE subquery, parses it back into a sqlglot expression, and returns that node for the active target's serializer to render — dissolving the emit-time string special-case. A single capability branch onctx.capabilities.supports_star_replaceselects the passthrough projection form: emitt.* REPLACE (...)on a target that supports it (DuckDB), and the portablet.* EXCEPT (start, end), <recomputed start>, <recomputed end>form otherwise, which every* EXCEPT-capable engine plans. The portable branch is what adds DataFusion support for non-canonical DISJOIN passthrough. Input canonicalization stays owned byCanonicalizeCoordinates(pass 2, #122) — the expander consumes already-canonical 0-based half-open columns and only round-trips the output back into the target's declared encoding.Fix #153 by aliasing all four projected columns (
kc/ks/ke/pos) in every__giql_dj_cutsUNION branch. Previously the bare de-canonicalizedendcolumn and the end-cut expression collided under one output name in the default 0-based half-open identity case; DuckDB tolerated the duplicate, but DataFusion rejected it as a non-unique projection name. With every branch aliased, the projection is internally unique on strict engines and behaviour-preserving on DuckDB. As a result, the cross-target oracle's previously-pinned_pending_153expected-failure is promoted to a real three-target identity test.Part of epic #137 wave 3; carries the shared
ExpanderRegistry.snapshot()/restore()seam that sibling wave-3 PRs also have (dedupe on merge).Closes #143
Fixes #153
Proposed changes
Registry save/restore seam (
src/giql/expander.py)Add the public
ExpanderRegistry.snapshot()/ExpanderRegistry.restore()methods, first introduced for these fixtures.snapshot()returns a fresh shallow copy of the(target, operator) → expanderregistrations;restore()drops all current entries and re-installs exactly the snapshot contents. This lets an isolating test fixture (or a plugin) capture the import-time baseline, mutate the process-wideREGISTRYaround a body, and hand the baseline back afterward so the built-in expanders survive a fixture that would otherwiseclear()them permanently.giql.expanderspackage + DISJOIN expander (src/giql/expanders/__init__.py,src/giql/expanders/disjoin.py)Add the
giql.expanderspackage whose__init__auto-imports every submodule viapkgutil.iter_modules, so dropping a<operator>.pyinto the package registers its expander as an import side effect without editing the package file. Adddisjoin.pywith the@register(GenericTarget, GIQLDisjoin)expander and its helpers (_build_disjoin_sql,_disjoin_passthrough,_disjoin_output_encoding,_disjoin_resolution), carrying over the original resolution-unpacking and historical diagnostics verbatim. The passthrough is the capability-driven form described in the summary; the identity 0-based half-open case stays a plaint.*fast path.#153 alias fix (isolated in
disjoin.py's__giql_dj_cutsassembly)Alias
kc/ks/ke/posin all three__giql_dj_cutsUNION branches. This is an isolated, cherry-pickable change: it only adds aliases to existing projections and does not depend on the migration or the capability branch.GIQLDisjoin.GIQL_EXPANDflip + legacy deletion (src/giql/expressions.py,src/giql/generators/base.py,src/giql/transpile.py)Flip
GIQLDisjoin.GIQL_EXPANDfrom the disabled sentinel toTrue, so theExpandOperatorspass replaces the node with the expander's AST. DeleteBaseGIQLGenerator.giqldisjoin_sqland the DISJOIN-only generator helpers (_disjoin_resolution,_disjoin_passthrough,_disjoin_output_encoding) plus the now-unusedGIQLDisjoinimport. Wireimport giql.expandersintranspile.pyso the registry is populated before the first transpile.Test updates
Update
test_disjoin_transpilation.py,test_canonicalizer.py, andtest_expander.pyfor the registry-driven path, and add the capability-passthrough andsnapshot()/restore()coverage. Two execute-on-engine harnesses now transpile with the engine dialect:test_usage_patterns.py(_execute) andcoordinate_space/conftest.py(giql_query) passdialect=engine/dialect="duckdb", because a non-canonical DISJOIN passthrough emits* EXCEPTfor the generic target and* EXCEPTis not DuckDB-runnable — the SQL must be shaped for the engine it executes on. Promote the cross-target oracle'stest_disjoin_on_datafusion_unsupported_pending_153expected-failure totest_disjoin_agrees_across_all_targets, a real three-target identity test.Test cases
TestDisjoinCanonicalization* EXCEPTprojection that drops and re-projects the interval columnsREPLACETestDisjoinCanonicalizationREPLACEon the final projectionREPLACEpassthrough on DuckDBTestDisjoinTranspilationdisjoin_chrom/disjoin_start/disjoin_endcolumnsTestDisjoinTranspilationEXISTSclauseTestDisjoinTranspilationEXISTSclause against the referenceTestDisjoinTranspilation__giql_dj_prefix, or an unknown reference nameTestExpanderRegistrysnapshot()snapshot()independenceTestExpanderRegistryrestore()is called with the snapshotrestore()semanticsTestExpandOperatorsPassGIQL_EXPANDdispatchTestNoOpWhenFlagsOffTestCrossTargetOracleDisjoinTestCrossTargetOracleDisjoin