Skip to content

Migrate INTERSECTS, CONTAINS, WITHIN, and set predicates to registered expanders — Closes #141#157

Draft
conradbzura wants to merge 5 commits into
mainfrom
141-migrate-spatial-predicate-expanders
Draft

Migrate INTERSECTS, CONTAINS, WITHIN, and set predicates to registered expanders — Closes #141#157
conradbzura wants to merge 5 commits into
mainfrom
141-migrate-spatial-predicate-expanders

Conversation

@conradbzura

Copy link
Copy Markdown
Collaborator

Summary

Migrate the spatial predicates (INTERSECTS / CONTAINS / WITHIN) and the quantified set predicates (ANY / ALL) off the legacy *_sql emitters on BaseGIQLGenerator and onto the generic operator-expander registry (epic #137, wave 3). Each predicate now expands to standard boolean sqlglot AST in the new giql.expanders package during pass 3 (ExpandOperators), built from the pass-1 ResolvedColumn metadata already canonicalized to 0-based half-open by pass 2, so the emitted predicate SQL is byte-identical to what the deleted emitters produced.

Restructure transpile.py so the join-strategy rewrites are capability-gated on the target's range_join_strategy and defer to the registry. The binned equi-join and DuckDB IEJoin transformers run as pre-pass transformers that consume a column-to-column INTERSECTS join before expansion; a literal-range or residual column-to-column INTERSECTS predicate survives to pass 3 and is rendered by the new expander exactly as the legacy emitter rendered it. Add a registry-deferral seam: a target-specific (target, Intersects) registry entry overrides the built-in join strategy entirely, letting the INTERSECTS node flow untouched into ExpandOperators. This removes the old dialect="duckdb" IEJoin early-return that skipped the expansion pipeline.

Deferred: folding the whole-query IEJoin string emitter itself into a node-level expander. Per the issue's design note, IntersectsDuckDBIEJoinTransformer.transform_to_sql emits a whole-query SET VARIABLE …; SELECT … string and rewrites the top-level statement, which a node-replacing OperatorExpander.expand(node, ctx) -> exp.Expression contract cannot express. The join-strategy rewrites therefore stay as capability-gated pre-pass transformers rather than (target, op) registry entries; only the registry-deferral hook is added so a future target-specific override can supersede them.

This is epic #137 wave 3; it carries the shared ExpanderRegistry.snapshot()/restore() seam that sibling wave-3 PRs also introduce (dedupe on merge).

Closes #141

Proposed changes

ExpanderRegistry save/restore seam (src/giql/expander.py)

Add snapshot() and restore() to ExpanderRegistry. snapshot() returns a shallow copy of the current (target, operator) -> expander registrations; restore() drops every current entry and re-installs exactly the snapshot contents. This is the public seam an isolating test fixture (or a plugin) uses to mutate the process-wide REGISTRY around a body and return it to a captured baseline — so the built-in expanders registered at import survive a fixture that would otherwise clear() them permanently. Shared with sibling wave-3 PRs.

giql.expanders package and predicate expanders (src/giql/expanders/__init__.py, src/giql/expanders/intersects.py)

Add the giql.expanders package. Importing it registers every built-in expander as a side effect: __init__.py uses pkgutil.iter_modules to import each submodule, which decorates its expanders with @register(...) at import time, so new operator modules are picked up by dropping a file in without editing the package.

Add giql.expanders.intersects with four GenericTarget expanders: expand_intersects, expand_contains, expand_within, and expand_spatial_set (ANY / ALL). Each turns one predicate node into a parenthesized boolean built from ResolvedColumn fragments parsed through the GIQL dialect, reproducing the deleted emitter helpers as AST: _range_predicate (literal-range form, including the point-query special case for CONTAINS), _column_join (column-to-column residual form), and the dispatch-on-right-operand logic of the old _generate_spatial_op. The literal-range path reproduces the legacy parse-and-wrap-error behavior verbatim (the historical "Could not parse genomic range" message). Only generic expanders are registered, since spatial-predicate emission is portable SQL-92 and does not vary by engine.

Capability-gated join transformers and registry-deferral (src/giql/transpile.py)

Import giql.expanders once so the registry is populated before the first transpile. Compute target_overrides_intersects — true only for an exact non-generic (target, Intersects) entry, deliberately excluding the built-in (GenericTarget, Intersects) predicate expander so it does not disable the join rewrite. Gate both the IEJoin path (if uses_iejoin and not target_overrides_intersects) and the binned-join transformer on this flag: when a target-specific override is registered, the join rewrite is skipped and the INTERSECTS node flows into ExpandOperators. Remove the dialect="duckdb" early-return's pipeline-skip warning block — the IEJoin transformer still short-circuits with a whole-query string when it produces output (safe, since an IEJoin-eligible query carries exactly one INTERSECTS and leaves no residual predicate), but the registry is now consulted on the deferral path it used to preclude.

GIQL_EXPAND flips and emitter deletion (src/giql/expressions.py, src/giql/generators/base.py)

Flip GIQL_EXPAND from the shared inert default to True on Intersects, Contains, Within, and SpatialSetPredicate so the four predicates opt into pass 3. Delete the intersects_sql / contains_sql / within_sql / spatialsetpredicate_sql emitters from BaseGIQLGenerator and their _generate_spatial_op / _generate_spatial_set / _generate_range_predicate / _generate_column_join / _predicate_operand helpers, plus the now-unused imports.

Test updates (tests/test_expander.py, tests/generators/test_base.py)

Rework the registry/flag leak guards to compare against a captured baseline (REGISTRY.snapshot()) rather than asserting emptiness, since the registry now ships built-in expanders at import; clean_registry saves and restores that baseline through the new seam. Add _SHIPPED_EXPAND_FLAGS and derive _MIGRATED_OPERATORS / _UNMIGRATED_OPERATORS dynamically so the flag-leak guard restores each operator to its shipped default and the opt-out parametrization stays merge-stable across wave-3 branches. Add an _opted_out context manager (complement of _opted_in) for control tests that need a migrated operator to behave as unflagged. Replace the old strict-xfail TestIEJoinEarlyReturnSkipsExpansion with TestIEJoinRegistryDeferral, add snapshot/restore coverage, and route the generator-level spatial tests through pass 3 via the updated _generate_through_passes helper (now runs passes 1-3).

Test cases

# Test Suite Given When Then Coverage Target
1 TestExpanderRegistryFallbackGaps A registry with one entry captured by snapshot A second entry is registered after the snapshot The snapshot still holds only the first entry snapshot() is a copy, not a live view
2 TestExpanderRegistryFallbackGaps A snapshot taken, then the registry cleared and a different entry registered Restoring the snapshot The original entry resolves again and the post-snapshot entry is gone restore() replaces entries with snapshot contents
3 TestIEJoinRegistryDeferral A column-to-column INTERSECTS join eligible for the duckdb IEJoin path with a (DuckDBTarget, Intersects) override registered Transpiling with dialect='duckdb' The override expander's sentinel appears and no SET VARIABLE IEJoin SQL is emitted IEJoin path defers to a target-specific override
4 TestIEJoinRegistryDeferral The same IEJoin-eligible duckdb query with no target-specific override registered Transpiling with dialect='duckdb' The built-in IEJoin SET VARIABLE SQL is emitted Default duckdb path keeps the built-in IEJoin strategy
5 TestOperatorOptOut A GIQL operator class not migrated onto the pass Reading its GIQL_EXPAND class attribute It is False Unmigrated operators stay on the legacy emitter
6 TestOperatorOptOut A GIQL operator class migrated onto the pass Reading its GIQL_EXPAND class attribute It is True Migrated operators opt into expansion
7 TestExpandOperatorsPass An expander registered for (GenericTarget, GIQLDisjoin) but the operator's flag held off via _opted_out Running the pass The operator node is left unexpanded Per-type GIQL_EXPAND gate isolates dispatch
8 TestNoOpWhenInert A DISTANCE query (unmigrated operator) with the default registry Transpiling with the wired-in pass versus a pass-bypassed reference The SQL matches exactly with no expander alias prefix Pass is inert for any unmigrated operator
9 TestExpandOperatorsWalk A query with INTERSECTS opted in and DISJOIN opted out as the control Running the pass Only the flagged operator is expanded Pass walks and expands per opted-in type
10 TestBaseGIQLGenerator An invalid genomic range string in INTERSECTS The INTERSECTS predicate is expanded through passes 1-3 A ValueError matching "Could not parse genomic range" is raised Expander reproduces the legacy parse-error message
11 TestBaseGIQLGenerator A malformed range string ('chr:a-b') in INTERSECTS The predicate is expanded through passes 1-3 A ValueError is raised Malformed-range error surfaces via the expander

Add a public save/restore pair to ExpanderRegistry so a caller can
capture the current registrations and later return the registry to that
exact baseline, regardless of what it registered or cleared in between.

This is the seam test fixtures need to isolate the process-wide registry
without permanently dropping the built-in expanders registered at import:
snapshot the baseline, clear, run the body, then restore. Both halves go
through public methods rather than touching private registry state.
Move INTERSECTS, CONTAINS, WITHIN, and the ANY/ALL set predicates off the
legacy generator emitters and onto the operator-expander registry. Each
predicate now expands to standard boolean SQL AST built from the pass-1
resolved column metadata, so the emitted SQL is byte-identical to what the
old emitters produced.

The new expanders package imports every submodule at import time, so its
generic-target expanders register themselves as a side effect; the spatial
and set predicate classes flip GIQL_EXPAND on to route through the pass.
The corresponding intersects_sql, contains_sql, within_sql, and
spatialsetpredicate_sql emitters and their helpers are deleted from the
base generator.

These are node-local predicate rewrites only. The column-to-column
INTERSECTS join rewrites remain separate pre-pass transformers, so an
expander only ever sees a literal-range or residual predicate, exactly as
the old emitters did.
Restructure the transpile pipeline so the column-to-column INTERSECTS join
rewrites are gated on the target's range_join_strategy capability, and add
registry deferral for a target-specific override.

Previously the DuckDB IEJoin path took an unconditional early return that
emitted whole-query SQL and skipped the rest of the pipeline, including the
expansion pass. That made it impossible for a migrated operator to expand
on an IEJoin-eligible query. Now the join rewrite is skipped when a
target-specific Intersects expander is registered, letting the INTERSECTS
node flow into expansion instead. The built-in generic predicate expander
does not count as an override, so it never disables the join strategy.

Also wire the expanders package import into the pipeline so the migrated
predicate expanders are registered before the first transpile, and update
the pass-3 comment to reflect that the spatial and set predicates now
expand here.
Adapt the expander tests to a registry that ships with built-in expanders
and operators that ship opted into expansion.

The leak guards and clean_registry fixture now treat the import-time
registry contents as the baseline, restoring it through the public
snapshot and restore seam rather than asserting emptiness. The flag leak
guard compares each operator against its shipped GIQL_EXPAND default
instead of a blanket False, and an opted-out helper lets a control test
hold a migrated operator as if unflagged. Add coverage for snapshot
independence and restore.

Replace the strict-xfail that pinned the old IEJoin early-return gap with
tests proving the IEJoin path now defers to a target-specific Intersects
override while the default path still emits the built-in IEJoin SQL. The
inert-pass test now uses an unmigrated operator so it still demonstrates a
no-op.
@conradbzura conradbzura self-assigned this Jun 28, 2026
Remove unreachable dispatch branches in the predicate expanders, add ExpanderRegistry.has_override and route the join-deferral gate through it, guard intersects_bin_size under a target override, and preserve tracebacks on the parse-error wrap. Add direct expander tests, binned-target deferral coverage, and error-message characterization. Make the registry docstrings mechanistic and node-local, restore the registry in place, harden auto-discovery, and key the opt-out control on a dynamically derived migrated operator.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate INTERSECTS, CONTAINS, WITHIN, and set predicates to registered expanders

1 participant