Migrate INTERSECTS, CONTAINS, WITHIN, and set predicates to registered expanders — Closes #141#157
Draft
conradbzura wants to merge 5 commits into
Draft
Migrate INTERSECTS, CONTAINS, WITHIN, and set predicates to registered expanders — Closes #141#157conradbzura wants to merge 5 commits into
conradbzura wants to merge 5 commits into
Conversation
Add a public save/restore pair to ExpanderRegistry so a caller can capture the current registrations and later return the registry to that exact baseline, regardless of what it registered or cleared in between. This is the seam test fixtures need to isolate the process-wide registry without permanently dropping the built-in expanders registered at import: snapshot the baseline, clear, run the body, then restore. Both halves go through public methods rather than touching private registry state.
Move INTERSECTS, CONTAINS, WITHIN, and the ANY/ALL set predicates off the legacy generator emitters and onto the operator-expander registry. Each predicate now expands to standard boolean SQL AST built from the pass-1 resolved column metadata, so the emitted SQL is byte-identical to what the old emitters produced. The new expanders package imports every submodule at import time, so its generic-target expanders register themselves as a side effect; the spatial and set predicate classes flip GIQL_EXPAND on to route through the pass. The corresponding intersects_sql, contains_sql, within_sql, and spatialsetpredicate_sql emitters and their helpers are deleted from the base generator. These are node-local predicate rewrites only. The column-to-column INTERSECTS join rewrites remain separate pre-pass transformers, so an expander only ever sees a literal-range or residual predicate, exactly as the old emitters did.
Restructure the transpile pipeline so the column-to-column INTERSECTS join rewrites are gated on the target's range_join_strategy capability, and add registry deferral for a target-specific override. Previously the DuckDB IEJoin path took an unconditional early return that emitted whole-query SQL and skipped the rest of the pipeline, including the expansion pass. That made it impossible for a migrated operator to expand on an IEJoin-eligible query. Now the join rewrite is skipped when a target-specific Intersects expander is registered, letting the INTERSECTS node flow into expansion instead. The built-in generic predicate expander does not count as an override, so it never disables the join strategy. Also wire the expanders package import into the pipeline so the migrated predicate expanders are registered before the first transpile, and update the pass-3 comment to reflect that the spatial and set predicates now expand here.
Adapt the expander tests to a registry that ships with built-in expanders and operators that ship opted into expansion. The leak guards and clean_registry fixture now treat the import-time registry contents as the baseline, restoring it through the public snapshot and restore seam rather than asserting emptiness. The flag leak guard compares each operator against its shipped GIQL_EXPAND default instead of a blanket False, and an opted-out helper lets a control test hold a migrated operator as if unflagged. Add coverage for snapshot independence and restore. Replace the strict-xfail that pinned the old IEJoin early-return gap with tests proving the IEJoin path now defers to a target-specific Intersects override while the default path still emits the built-in IEJoin SQL. The inert-pass test now uses an unmigrated operator so it still demonstrates a no-op.
Remove unreachable dispatch branches in the predicate expanders, add ExpanderRegistry.has_override and route the join-deferral gate through it, guard intersects_bin_size under a target override, and preserve tracebacks on the parse-error wrap. Add direct expander tests, binned-target deferral coverage, and error-message characterization. Make the registry docstrings mechanistic and node-local, restore the registry in place, harden auto-discovery, and key the opt-out control on a dynamically derived migrated operator.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Migrate the spatial predicates (INTERSECTS / CONTAINS / WITHIN) and the quantified set predicates (ANY / ALL) off the legacy
*_sqlemitters onBaseGIQLGeneratorand onto the generic operator-expander registry (epic #137, wave 3). Each predicate now expands to standard boolean sqlglot AST in the newgiql.expanderspackage during pass 3 (ExpandOperators), built from the pass-1ResolvedColumnmetadata already canonicalized to 0-based half-open by pass 2, so the emitted predicate SQL is byte-identical to what the deleted emitters produced.Restructure
transpile.pyso the join-strategy rewrites are capability-gated on the target'srange_join_strategyand defer to the registry. The binned equi-join and DuckDB IEJoin transformers run as pre-pass transformers that consume a column-to-column INTERSECTS join before expansion; a literal-range or residual column-to-column INTERSECTS predicate survives to pass 3 and is rendered by the new expander exactly as the legacy emitter rendered it. Add a registry-deferral seam: a target-specific(target, Intersects)registry entry overrides the built-in join strategy entirely, letting the INTERSECTS node flow untouched intoExpandOperators. This removes the olddialect="duckdb"IEJoin early-return that skipped the expansion pipeline.Deferred: folding the whole-query IEJoin string emitter itself into a node-level expander. Per the issue's design note,
IntersectsDuckDBIEJoinTransformer.transform_to_sqlemits a whole-querySET VARIABLE …; SELECT …string and rewrites the top-level statement, which a node-replacingOperatorExpander.expand(node, ctx) -> exp.Expressioncontract cannot express. The join-strategy rewrites therefore stay as capability-gated pre-pass transformers rather than(target, op)registry entries; only the registry-deferral hook is added so a future target-specific override can supersede them.This is epic #137 wave 3; it carries the shared
ExpanderRegistry.snapshot()/restore()seam that sibling wave-3 PRs also introduce (dedupe on merge).Closes #141
Proposed changes
ExpanderRegistrysave/restore seam (src/giql/expander.py)Add
snapshot()andrestore()toExpanderRegistry.snapshot()returns a shallow copy of the current(target, operator) -> expanderregistrations;restore()drops every current entry and re-installs exactly the snapshot contents. This is the public seam an isolating test fixture (or a plugin) uses to mutate the process-wideREGISTRYaround a body and return it to a captured baseline — so the built-in expanders registered at import survive a fixture that would otherwiseclear()them permanently. Shared with sibling wave-3 PRs.giql.expanderspackage and predicate expanders (src/giql/expanders/__init__.py,src/giql/expanders/intersects.py)Add the
giql.expanderspackage. Importing it registers every built-in expander as a side effect:__init__.pyusespkgutil.iter_modulesto import each submodule, which decorates its expanders with@register(...)at import time, so new operator modules are picked up by dropping a file in without editing the package.Add
giql.expanders.intersectswith fourGenericTargetexpanders:expand_intersects,expand_contains,expand_within, andexpand_spatial_set(ANY / ALL). Each turns one predicate node into a parenthesized boolean built fromResolvedColumnfragments parsed through the GIQL dialect, reproducing the deleted emitter helpers as AST:_range_predicate(literal-range form, including the point-query special case for CONTAINS),_column_join(column-to-column residual form), and the dispatch-on-right-operand logic of the old_generate_spatial_op. The literal-range path reproduces the legacy parse-and-wrap-error behavior verbatim (the historical "Could not parse genomic range" message). Only generic expanders are registered, since spatial-predicate emission is portable SQL-92 and does not vary by engine.Capability-gated join transformers and registry-deferral (
src/giql/transpile.py)Import
giql.expandersonce so the registry is populated before the first transpile. Computetarget_overrides_intersects— true only for an exact non-generic(target, Intersects)entry, deliberately excluding the built-in(GenericTarget, Intersects)predicate expander so it does not disable the join rewrite. Gate both the IEJoin path (if uses_iejoin and not target_overrides_intersects) and the binned-join transformer on this flag: when a target-specific override is registered, the join rewrite is skipped and the INTERSECTS node flows intoExpandOperators. Remove thedialect="duckdb"early-return's pipeline-skip warning block — the IEJoin transformer still short-circuits with a whole-query string when it produces output (safe, since an IEJoin-eligible query carries exactly one INTERSECTS and leaves no residual predicate), but the registry is now consulted on the deferral path it used to preclude.GIQL_EXPANDflips and emitter deletion (src/giql/expressions.py,src/giql/generators/base.py)Flip
GIQL_EXPANDfrom the shared inert default toTrueonIntersects,Contains,Within, andSpatialSetPredicateso the four predicates opt into pass 3. Delete theintersects_sql/contains_sql/within_sql/spatialsetpredicate_sqlemitters fromBaseGIQLGeneratorand their_generate_spatial_op/_generate_spatial_set/_generate_range_predicate/_generate_column_join/_predicate_operandhelpers, plus the now-unused imports.Test updates (
tests/test_expander.py,tests/generators/test_base.py)Rework the registry/flag leak guards to compare against a captured baseline (
REGISTRY.snapshot()) rather than asserting emptiness, since the registry now ships built-in expanders at import;clean_registrysaves and restores that baseline through the new seam. Add_SHIPPED_EXPAND_FLAGSand derive_MIGRATED_OPERATORS/_UNMIGRATED_OPERATORSdynamically so the flag-leak guard restores each operator to its shipped default and the opt-out parametrization stays merge-stable across wave-3 branches. Add an_opted_outcontext manager (complement of_opted_in) for control tests that need a migrated operator to behave as unflagged. Replace the old strict-xfailTestIEJoinEarlyReturnSkipsExpansionwithTestIEJoinRegistryDeferral, addsnapshot/restorecoverage, and route the generator-level spatial tests through pass 3 via the updated_generate_through_passeshelper (now runs passes 1-3).Test cases
TestExpanderRegistryFallbackGapssnapshot()is a copy, not a live viewTestExpanderRegistryFallbackGapsrestore()replaces entries with snapshot contentsTestIEJoinRegistryDeferral(DuckDBTarget, Intersects)override registereddialect='duckdb'SET VARIABLEIEJoin SQL is emittedTestIEJoinRegistryDeferraldialect='duckdb'SET VARIABLESQL is emittedTestOperatorOptOutGIQL_EXPANDclass attributeFalseTestOperatorOptOutGIQL_EXPANDclass attributeTrueTestExpandOperatorsPass(GenericTarget, GIQLDisjoin)but the operator's flag held off via_opted_outGIQL_EXPANDgate isolates dispatchTestNoOpWhenInertTestExpandOperatorsWalkTestBaseGIQLGeneratorValueErrormatching "Could not parse genomic range" is raisedTestBaseGIQLGenerator'chr:a-b') in INTERSECTSValueErroris raised