Skip to content

Migrate INTERSECTS, CONTAINS, WITHIN, and set predicates to registered expanders #141

Description

@conradbzura

Description

Implement generic expanders for the spatial predicates and set predicates, and fold the existing engine-specific join strategies into the registry: IntersectsDuckDBIEJoinTransformer becomes the (DuckDBTarget, Intersects) expander and IntersectsBinnedJoinTransformer becomes the (GenericTarget, Intersects) expander. Remove the dialect="duckdb" early-return in transpile(), whose behaviour now lives in the registry.

Motivation

This is the first case of engine-specific optimization expressed through the registry rather than a hardcoded dialect branch, validating that optimization and baseline strategies coexist cleanly as registered expanders.

Expected Outcome

The predicate operators transpile through the registry; the DuckDB IEJoin and generic binned-join paths are registered expanders selected by target; the early-return is gone; results are identical across targets and unchanged, including under the existing binned-join and IEJoin suites.


Design constraints surfaced by the #138 review (PR #151)

The operator-expander registry/pass landed in #138 (PR #151). Reviewing it surfaced constraints this step must account for — "fold the IEJoin transformer into a (DuckDBTarget, Intersects) expander" is materially more than a relocation:

  • The DuckDB IEJoin early-return runs before ExpandOperators. transpile() returns the IEJoin SQL at the dialect="duckdb" early-return before the expansion pass is constructed, so a flagged DuckDB operator on an IEJoin-eligible query is silently skipped. This step must move expansion ahead of the early-return (or have the IEJoin transformer defer to the registry). It is pinned by the strict xfail TestIEJoinEarlyReturnSkipsExpansion in tests/test_expander.py, which flips to a hard failure (XPASS) once this is fixed — delete it as part of this step.
  • The IEJoin rewrite is whole-query, not node-local. IntersectsDuckDBIEJoinTransformer.transform_to_sql returns a whole-query SET VARIABLE …; SELECT … string and rewrites the top-level statement. A node-replacing OperatorExpander.expand(node, ctx) -> exp.Expression (which node.replaces a single Intersects predicate) cannot express that. This step must define a contract for query-level expanders — e.g. allow returning a replacement for the enclosing Select, or keep join-strategy rewrites as capability-gated pre-pass transformers (keyed on capabilities.range_join_strategy) rather than (target, op) registry entries.
  • The IEJoin path runs on the raw parsed AST (before resolve_operator_refs), so a registry expander would need ctx.resolution it never had. The port must consume resolved + canonicalized AST.
  • ExpansionContext needs extending. It carries no intersects_bin_size (today a transpile() parameter the (Generic, Intersects) binned expander needs) and no registry handle (a per-target expander may need to delegate to the generic one). Plan the context extension here.
  • Nested-operator expansion: see Fix ExpandOperators dropping inner expansions when operators nest #154 — the pass currently drops inner expansions when flagged operators nest; ensure it is fixed before this step flags operators that can appear inside other operands.

Metadata

Metadata

Assignees

Labels

refactorCode restructuring without behavior change

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions