Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/language/reference/dataset_methods.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ The Substrait helper surface behind these methods is split by semantic role:
| `with_column` | `def with_column(self, name: str, expr: ColumnExpr) -> Self` | Add or replace one projected column using a scalar expression. |
| `group_by` | `def group_by(self, columns: list[ColumnExpr]) -> Self` | Define grouping keys using scalar expressions. |
| `agg` | `def agg(self, measures: list[AggregateMeasure]) -> Self` | Apply aggregate measures over the current relation or current grouping. |
| `order_by` | `def order_by(self) -> Self` | Preserve order-planning shape for the package sort boundary. |
| `order_by` | `def order_by(self, columns: list[ColumnExpr]) -> Self` | Sort rows by scalar expressions or ordering helpers such as `asc(...)` and `desc(...)`. |
| `limit` | `def limit(self, n: int) -> Self` | Cap row count. |
| `explode` | `def explode(self) -> Self` | Expand a nested list column into rows. |

Expand Down
5 changes: 4 additions & 1 deletion docs/language/reference/execution_context.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ This page documents the public execution surface in the InQL package. Normative
- `Session` is the public execution context for registration, binding, execution, collection, and writes.
- `SessionBuilder` configures a `Session` before construction.
- `SessionError` is the typed error surface for registration, planning, execution, materialization, and sink failures.
- `BackendSelection` is the portable backend selection envelope stored by a session.
- `BackendOption` carries adapter-specific configuration without adding one field per backend to `Session`.
- `backends.DataFusion()` is the current reference backend configuration entry point.

## Construction
Expand All @@ -15,6 +17,7 @@ This page documents the public execution surface in the InQL package. Normative
| ------------------------------------------------------------------ | ------------------------------------------------------------------- |
| `Session.default()` | Create a session with the default backend and default configuration |
| `Session.builder()` | Create a builder for backend selection and configuration |
| `Session.builder().with_backend(selection).build()` | Build a session from a portable backend-selection envelope |
| `Session.builder().with_datafusion(backends.DataFusion()).build()` | Build an explicit DataFusion-backed session |

## Read and registration surface
Expand Down Expand Up @@ -74,7 +77,7 @@ If no active session exists when a convenience API needs one, the operation fail

## Backend note

DataFusion is the implemented execution backend. The public builder/configuration surface is designed so additional backends can be added without changing the `Session` entry point.
DataFusion is the implemented execution backend. `Session` stores a backend kind plus encoded options, lowers work to Substrait, and dispatches through an internal backend adapter boundary. DataFusion is the first adapter behind that boundary; it is not the shape of the `Session` state.

## Related docs

Expand Down
15 changes: 11 additions & 4 deletions docs/language/reference/functions/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,18 +10,25 @@ Today the concrete shipped surfaces are documented here:

The canonical scalar literal helper is `lit(...)`. Typed literal helpers construct the same scalar-expression representation.

The current public helper surface is also registered in the package-owned function registry. Registry types live in `src/function_registry.incn`, while the concrete public helper entries are produced by `FUNCTION_REGISTRY.add(...)` decorators in `src/functions.incn`. Each entry exposes a stable function reference such as `inql.functions.col`, canonical name, typed lifecycle metadata (`since`, versioned changes, and optional deprecation), signature facts, function class, null behavior, alias policy, and Substrait mapping metadata.
The current registry-backed helper surface is registered in the package-owned function registry. Registry types live in `src/function_registry.incn`, the shared package registry lives in `src/functions/registry.incn`, and concrete public helper entries are produced by `function_registry.add(...)` decorators in individual `src/functions/<family>/<name>.incn` modules. The registry-backed families are references, literals, casts, operators, predicates, conditionals, ordering, and aggregates. Each runtime entry exposes a stable function reference such as `inql.functions.col`, canonical name, typed lifecycle metadata (`since`, versioned changes, and optional deprecation), function class, null behavior, alias policy, and Substrait mapping metadata. Checked function signatures come from the public helper declaration, not from a second hand-written registry signature.

The registry is the source for machine-readable function facts. Docstrings remain human-facing explanation, while argument names, type rules, lifecycle facts, and Substrait mappings come from typed registry metadata and public helper signatures. The `registry-metadata` check validates that runtime registry entries produced by decorators still agree with checked API metadata for decorator canonical names, argument names, argument types, and return types. This matters for generated docs, diagnostics, Prism lowering, and backend capability checks as the catalog grows.
The registry is the source for non-derivable machine facts. Public helper declarations are the source for argument names, argument types, and return types. Docstrings remain human-facing explanation, examples, and parameter intent. The `registry-metadata` check validates the checked API metadata projections produced from public facade aliases, registry decorators, and decorated callable signatures. Runtime registry entries are lazy and process-local: they support helper execution and lowering for loaded helpers, while the complete public catalog comes from checked metadata. This matters for generated docs, diagnostics, Prism lowering, and backend capability checks as the catalog grows.

The first registered helpers are:
The registered helper surface currently includes:

| Function | Registry class | Mapping |
| --- | --- | --- |
| `col(...)` | scalar | deterministic field-reference rewrite |
| `lit(...)`, `int_expr(...)`, `float_expr(...)`, `str_expr(...)`, `bool_expr(...)`, `int_lit(...)`, `str_lit(...)`, `bool_lit(...)` | scalar | deterministic literal rewrites |
| `add(...)`, `mul(...)`, `eq(...)`, `gt(...)` | scalar | registered Substrait extension functions |
| `always_true()`, `always_false()` | scalar | deterministic boolean-literal rewrites |
| `cast(...)`, `try_cast(...)` | scalar | built-in Substrait `Cast` Rex shapes; `try_cast` uses return-null failure behavior |
| `add(...)`, `sub(...)`, `mul(...)`, `div(...)`, `modulo(...)`, `neg(...)` | scalar | registered Substrait scalar mappings; `modulo(...)` registers canonical `mod` |
| `eq(...)`, `ne(...)`, `lt(...)`, `lte(...)`, `gt(...)`, `gte(...)`, `equal_null(...)` | scalar | registered Substrait scalar mappings; `equal_null(...)` lowers as null-safe equality |
| `and_(...)`, `or_(...)`, `not_(...)` | scalar | registered Substrait boolean mappings |
| `is_null(...)`, `is_not_null(...)`, `is_nan(...)`, `is_not_nan(...)` | scalar | registered predicate mappings; `is_not_nan(...)` lowers as `not(is_nan(...))` |
| `coalesce(...)`, `nullif(...)`, `case_when(...)` | scalar | registered Substrait mappings; `case_when(...)` lowers as built-in `IfThen` |
| `in_(...)`, `between(...)` | scalar | built-in membership/range lowering (`SingularOrList` and `between`) |
| `asc(...)`, `desc(...)`, `asc_nulls_first(...)`, `asc_nulls_last(...)`, `desc_nulls_first(...)`, `desc_nulls_last(...)` | ordering | structural sort-field helpers consumed by `order_by(...)` and lowered to Substrait `SortRel.sorts` |
| `sum(...)`, `count()` | aggregate | registered Substrait extension functions |

Future ANSI-style families should grow under this section instead of bloating `dataset_types` or `dataset_methods`.
5 changes: 3 additions & 2 deletions docs/release_notes/v0_1.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,12 @@ Entries will be filled in as work lands (link RFCs and PRs when applicable).
- **Authoring:** method-chain lowering into a real Substrait boundary today, with `query {}` work still ahead.
- **Aggregates:** builder-based `col`, `sum`, and `count` helpers now lower grouped and global aggregates through Prism, Substrait, and Session execution.
- **Scalar expressions:** RFC 012 unifies filter predicates, computed projection values, grouping keys, and aggregate inputs around one `ColumnExpr` surface with canonical `lit(...)` and typed literal helpers.
- **Function registry:** RFC 014 adds declaration-site registry decorators for the current public helper surface, including stable function references, signature facts, lifecycle metadata, behavior categories, alias policy, Substrait mapping categories, and checked API metadata drift validation.
- **Core scalar functions:** RFC 015 adds registry-backed scalar function applications and the first core helper slice for casts, comparisons, boolean logic, null/NaN predicates, arithmetic, conditionals, membership/range predicates, and ordering expressions. Implemented helpers lower to Substrait IR through registry metadata, built-in Rex shapes, or structural sort-field lowering; DataFusion remains the first execution adapter rather than the semantic boundary.
- **Function registry:** RFC 014 adds declaration-site registry decorators for the current public helper surface, including stable function references, checked signature projection, lifecycle metadata, behavior categories, alias policy, Substrait mapping categories, and checked API metadata drift validation.
- **Projection:** builder-based `with_column`, `add`, `mul`, and literal expression helpers now lower derived columns through Prism, Substrait, and Session execution.
- **Substrait internals:** RFC 002 helpers are now split into focused owner modules for relation building, plan assembly, inspection, schema registry, extension bookkeeping, and expression lowering instead of one `substrait.plan` godmodule.
- **Prism:** `LazyFrame` lowering applies safe canonical rewrites (`Filter(true)` elimination and adjacent `Limit`/`Project`/`OrderBy` collapse) before RFC 002 plan emission.
- **Execution:** Session-oriented read, execute, and write (reference backend per RFC 004), with `collect(...)` now producing structured `DataFrame` materialization metadata plus preview text instead of treating rendered text as the canonical contract.
- **Execution:** Session-oriented read, execute, and write (reference backend per RFC 004), with `collect(...)` now producing structured `DataFrame` materialization metadata plus preview text instead of treating rendered text as the canonical contract. Session execution dispatch now routes through a backend adapter boundary over Substrait plans; DataFusion remains the first adapter rather than being encoded directly into Session state.
- **Documentation:** Current package behavior is documented under `docs/language/`, while RFCs remain design records rather than implementation diaries.

Pipe-forward (`|>`) is specified in RFC 005 but **out of scope** for v0.1.
Expand Down
Loading
Loading