You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make the SQL target dialect a first-class concept that drives both engine-compatible emission and engine-specific optimization, by promoting GIQL operator expansion out of the string-emitting generator into a registry of per-dialect, AST-producing operator expanders. The end state: BaseGIQLGenerator becomes a stock sqlglot serializer, every operator emits standard sqlglot AST chosen per target, and a public registration hook lets users add or override targets.
Target model + capabilities. Each dialect is a class — GenericTarget, DuckDBTarget, DataFusionTarget — carrying a capability set (supports_lateral, supports_star_replace, supports_qualify, range_join_strategy, …) and the sqlglot output dialect for serialization. Portable choices become capability lookups, not scattered if dialect == ... branches: the star-REPLACE-vs-explicit-projection decision and the LATERAL-vs-window-function decision are driven by capabilities.
Operator-expander protocol + registry. An OperatorExpander is expand(node, ctx) -> exp.Expression — it takes a GIQL operator node plus an ExpansionContext (pass-1 resolved metadata, the active target/capabilities, alias minting, tables) and returns standard sqlglot AST. The registry is keyed by (target, operator_type) with a fallback chain (target, op) → (generic, op) → legacy *_sql emitter. Write a generic expander once; override per-dialect only where the engine genuinely differs (a limitation or an optimization). The registry is the public extension hook — users register their own target or override (their_target, op) via a decorator, e.g. @register(DuckDBTarget, GIQLDisjoin).
ExpandOperators pass. Runs after CanonicalizeCoordinates; walks the AST and replaces each GIQL operator node with the expansion the registry returns for the active target.
transpile() wiring + feature flag. The dialect param resolves to a Target (backward-compatible: None→Generic, "duckdb"→DuckDB; add "datafusion"). Incremental migration uses a per-operator GIQL_EXPAND class attribute — mirroring the proven GIQL_CANONICALIZE pattern from epic Introduce pre-generation AST normalization pipeline for operator resolution and coordinate canonicalization #114 — so an operator takes the new path only when it is flagged AND an expander is registered, otherwise the legacy emitter. Each migration PR flips one operator, behavior-preserving until the last.
Motivation
Dialect portability. sqlglot serializes standard AST per target for free (identifier quoting, function spelling, supported-construct syntax). The two dialect walls already hit in Introduce pre-generation AST normalization pipeline for operator resolution and coordinate canonicalization #114 — SUPPORTS_LATERAL/SQLite for NEAREST, and SELECT * REPLACE portability for non-canonical canonicalization — move from f-string special-cases into capability-driven expanders. Caveat: AST expansion makes the syntactic layer cheap; semantic fallbacks (e.g. LATERAL→window-function) still need real work, but that work lives in one centralized place per target rather than tangled into templates.
Engine-specific optimization is a first-class slot. The existing hardcoded DuckDB IEJoin path is the precedent: IntersectsDuckDBIEJoinTransformer becomes the (DuckDBTarget, Intersects) expander and IntersectsBinnedJoinTransformer the (GenericTarget, Intersects) one — the dialect="duckdb" early-return in transpile() dissolves into the registry.
Each step is a child sub-issue; each lands independently and behaviour-preserving (per-operator GIQL_EXPAND flag, result-snapshot oracle) until step 10 removes the flag and the legacy path.
Expander protocol + registry + ExpandOperators pass scaffolding + GIQL_EXPAND flag. Land the registration decorator, the registry with the (target → generic → legacy) fallback chain, the ExpansionContext, the pass, and the per-operator flag. With nothing flagged it is a strict no-op. Includes registry unit tests and an extension-hook test (register a fake custom target, assert dispatch).
Result-oracle test harness + DataFusion integration lane. A cross-target result-identity helper and a DataFusion integration suite (deps already present; only test_binned_join.py exercises it today) so every later migration proves DuckDB≡DataFusion≡expected.
DISTANCE (proof-of-concept — single CASE). Generic expander; flip GIQL_EXPAND; verify cross-target identity; delete its *_sql method.
INTERSECTS / CONTAINS / WITHIN + set predicates. Generic expanders; fold IntersectsDuckDBIEJoinTransformer → (DuckDB, Intersects) and IntersectsBinnedJoinTransformer → (Generic, Intersects); remove the duckdb early-return in transpile().
NEAREST. Generic LATERAL expander + capability-driven window-function fallback for no-lateral targets; row passthrough as AST.
DISJOIN. WITH-CTE expansion as AST; full-row passthrough and output de-canonicalization as AST; capability-driven canonicalization output (explicit portable projection for no-REPLACE targets like DataFusion, * REPLACE for DuckDB) — this resolves the star-REPLACE portability limitation documented in Port DISJOIN from in-emitter canonicalization to CanonicalizeCoordinates output #122.
CLUSTER / MERGE. Relocate ClusterTransformer/MergeTransformer into the registry as generic expanders for consistency.
DataFusion target completion + dialect-aware canonicalization finalization. Verify every operator has a DataFusion path; move the canonicalizer's REPLACE-vs-explicit decision fully behind capabilities; complete DataFusion integration coverage.
Generator reduction + remove feature flag + extension-hook docs. Delete the now-dead *_sql methods; reduce/replace BaseGIQLGenerator with per-target stock serializers; remove the migration flag and legacy path; document the registration extension hook (registering a custom target, overriding an operator). Closes the epic.
Non-goals
Re-implementing CLUSTER/MERGE/binned-join logic — those already produce AST and are relocated, not rewritten.
Adding SQLite/Postgres targets in this epic (the registry is designed so they are pure additions later: register a target + capability set, override only divergent operators).
Operator semantic changes — expansions are behaviour-preserving against the result oracle.
Existing per-dialect precedent: IntersectsDuckDBIEJoinTransformer / IntersectsBinnedJoinTransformer in src/giql/transformer.py; the dialect param in src/giql/transpile.py.
Description
Make the SQL target dialect a first-class concept that drives both engine-compatible emission and engine-specific optimization, by promoting GIQL operator expansion out of the string-emitting generator into a registry of per-dialect, AST-producing operator expanders. The end state:
BaseGIQLGeneratorbecomes a stock sqlglot serializer, every operator emits standard sqlglot AST chosen per target, and a public registration hook lets users add or override targets.Architecture (four seams):
Target model + capabilities. Each dialect is a class —
GenericTarget,DuckDBTarget,DataFusionTarget— carrying a capability set (supports_lateral,supports_star_replace,supports_qualify,range_join_strategy, …) and the sqlglot output dialect for serialization. Portable choices become capability lookups, not scatteredif dialect == ...branches: the star-REPLACE-vs-explicit-projection decision and the LATERAL-vs-window-function decision are driven by capabilities.Operator-expander protocol + registry. An
OperatorExpanderisexpand(node, ctx) -> exp.Expression— it takes a GIQL operator node plus anExpansionContext(pass-1 resolved metadata, the active target/capabilities, alias minting, tables) and returns standard sqlglot AST. The registry is keyed by(target, operator_type)with a fallback chain(target, op)→(generic, op)→ legacy*_sqlemitter. Write a generic expander once; override per-dialect only where the engine genuinely differs (a limitation or an optimization). The registry is the public extension hook — users register their own target or override(their_target, op)via a decorator, e.g.@register(DuckDBTarget, GIQLDisjoin).ExpandOperators pass. Runs after
CanonicalizeCoordinates; walks the AST and replaces each GIQL operator node with the expansion the registry returns for the active target.transpile() wiring + feature flag. The
dialectparam resolves to a Target (backward-compatible:None→Generic,"duckdb"→DuckDB; add"datafusion"). Incremental migration uses a per-operatorGIQL_EXPANDclass attribute — mirroring the provenGIQL_CANONICALIZEpattern from epic Introduce pre-generation AST normalization pipeline for operator resolution and coordinate canonicalization #114 — so an operator takes the new path only when it is flagged AND an expander is registered, otherwise the legacy emitter. Each migration PR flips one operator, behavior-preserving until the last.Motivation
SUPPORTS_LATERAL/SQLite for NEAREST, andSELECT * REPLACEportability for non-canonical canonicalization — move from f-string special-cases into capability-driven expanders. Caveat: AST expansion makes the syntactic layer cheap; semantic fallbacks (e.g. LATERAL→window-function) still need real work, but that work lives in one centralized place per target rather than tangled into templates.IntersectsDuckDBIEJoinTransformerbecomes the(DuckDBTarget, Intersects)expander andIntersectsBinnedJoinTransformerthe(GenericTarget, Intersects)one — thedialect="duckdb"early-return intranspile()dissolves into the registry.decanonical_*on synthesized columns) dissolves into AST. The registration pattern is exposed as a supported extension hook for users targeting their own engines.Staged migration plan
Each step is a child sub-issue; each lands independently and behaviour-preserving (per-operator
GIQL_EXPANDflag, result-snapshot oracle) until step 10 removes the flag and the legacy path.dialectparam → Target. (Make dialect a first-class target selector driving engine-specific optimization and compatible SQL emission #132) DefineGenericTarget/DuckDBTarget/DataFusionTarget, the capability descriptors, and resolvetranspile()'sdialectparam to a Target. Backward compatible; no expansion yet.GIQL_EXPANDflag. Land the registration decorator, the registry with the(target → generic → legacy)fallback chain, theExpansionContext, the pass, and the per-operator flag. With nothing flagged it is a strict no-op. Includes registry unit tests and an extension-hook test (register a fake custom target, assert dispatch).test_binned_join.pyexercises it today) so every later migration proves DuckDB≡DataFusion≡expected.CASE). Generic expander; flipGIQL_EXPAND; verify cross-target identity; delete its*_sqlmethod.IntersectsDuckDBIEJoinTransformer→(DuckDB, Intersects)andIntersectsBinnedJoinTransformer→(Generic, Intersects); remove the duckdb early-return intranspile().REPLACEtargets like DataFusion,* REPLACEfor DuckDB) — this resolves the star-REPLACEportability limitation documented in Port DISJOIN from in-emitter canonicalization to CanonicalizeCoordinates output #122.ClusterTransformer/MergeTransformerinto the registry as generic expanders for consistency.REPLACE-vs-explicit decision fully behind capabilities; complete DataFusion integration coverage.*_sqlmethods; reduce/replaceBaseGIQLGeneratorwith per-target stock serializers; remove the migration flag and legacy path; document the registration extension hook (registering a custom target, overriding an operator). Closes the epic.Non-goals
References
*_sqlexpansion methods and output de-canonicalization for this epic.IntersectsDuckDBIEJoinTransformer/IntersectsBinnedJoinTransformerinsrc/giql/transformer.py; thedialectparam insrc/giql/transpile.py.src/giql/generators/base.py; canonicalizer output pattern:src/giql/canonicalizer.py.