Skip to content

Commit aa80834

Browse files
authored
feature - modernize InQL against incan v0.3 (#25) (#29)
1 parent 3a78237 commit aa80834

73 files changed

Lines changed: 2656 additions & 2541 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,9 @@ pnpm-debug.log*
4545
# Local AI agent instructions
4646
.github/copilot-instructions.md
4747

48+
# Local agent state
49+
.agents/state/
50+
4851
# MkDocs build output
4952
site/
5053

docs/architecture.md

Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -129,40 +129,40 @@ Session is intentionally outside RFC 002’s normative emitted contract. It cons
129129

130130
For current package behavior, see [Execution context (Reference)][execution-ref] and [Execution context (Explanation)][execution-expl].
131131

132-
## Current implementation shape
132+
## Package implementation shape
133133

134-
The package currently uses the following implementation shape:
134+
The package uses the following implementation shape:
135135

136136
- author-facing carrier types live in [mod.incn](../src/dataset/mod.incn)
137137
- canonical relational operator helpers live in [ops.incn](../src/dataset/ops.incn)
138138
- Substrait emission lives under [substrait/](../src/substrait/)
139139
- Prism internals live under [prism/](../src/prism/)
140-
- `LazyFrame[T]` currently routes through a backend-native `PrismCursor[T]`
141-
- `DataFrame[T]` and `DataStream[T]` are not yet fully converged on the same internal backing model as `LazyFrame[T]`
140+
- `LazyFrame[T]` routes through a backend-native `PrismCursor[T]`
141+
- `DataFrame[T]` and `DataStream[T]` keep their carrier-specific backing shapes while sharing the public dataset surface
142142

143-
This is enough to explain the package architecture while keeping current API behavior in language docs and follow-on gaps in RFCs, issues, and release notes.
143+
This keeps package architecture in this document while detailed API behavior lives in language docs and future surface expansion stays in RFCs, issues, and release notes.
144144

145145
## Repository layout
146146

147-
| Path | Role |
148-
| -------------------------------- | -------------------------------------------------- |
149-
| `incan.toml` | Package metadata and Rust dependency declarations |
150-
| `src/lib.incn` | Public package exports |
151-
| `src/dataset/mod.incn` | Carrier types and trait surface |
152-
| `src/dataset/ops.incn` | Canonical relational operator helpers |
153-
| `src/prism/mod.incn` | Internal Prism graph, cursor, and lowering logic |
154-
| `src/substrait/relations.incn` | Concrete `Rel` builders and relation lowering |
155-
| `src/substrait/plans.incn` | Top-level `Plan` assembly helpers |
156-
| `src/substrait/inspect.incn` | Relation/plan inspection and output-column helpers |
157-
| `src/substrait/schema_registry.incn` | Named-table schema registration and lookup |
158-
| `src/substrait/extensions.incn` | Extension anchors, URIs, and declaration helpers |
159-
| `src/substrait/expr_lowering.incn` | Builder-to-Substrait expression lowering |
160-
| `src/substrait/conformance.incn` | Typed conformance facade over catalog + validators |
161-
| `src/substrait/schema.incn` | Model/schema to Substrait type bridging |
162-
| `tests/` | Package tests run through `incan test` |
163-
| `docs/language/` | Current package docs |
164-
| `docs/rfcs/` | Normative RFC series |
165-
| `docs/release_notes/` | Release-facing notes |
147+
| Path | Role |
148+
| ------------------------------------ | -------------------------------------------------- |
149+
| `incan.toml` | Package metadata and Rust dependency declarations |
150+
| `src/lib.incn` | Public package exports |
151+
| `src/dataset/mod.incn` | Carrier types and trait surface |
152+
| `src/dataset/ops.incn` | Canonical relational operator helpers |
153+
| `src/prism/mod.incn` | Internal Prism graph, cursor, and lowering logic |
154+
| `src/substrait/relations.incn` | Concrete `Rel` builders and relation lowering |
155+
| `src/substrait/plans.incn` | Top-level `Plan` assembly helpers |
156+
| `src/substrait/inspect.incn` | Relation/plan inspection and output-column helpers |
157+
| `src/substrait/schema_registry.incn` | Named-table schema registration and lookup |
158+
| `src/substrait/extensions.incn` | Extension anchors, URIs, and declaration helpers |
159+
| `src/substrait/expr_lowering.incn` | Builder-to-Substrait expression lowering |
160+
| `src/substrait/conformance.incn` | Typed conformance facade over catalog + validators |
161+
| `src/substrait/schema.incn` | Model/schema to Substrait type bridging |
162+
| `tests/` | Package tests run through `incan test` |
163+
| `docs/language/` | Current package docs |
164+
| `docs/rfcs/` | Normative RFC series |
165+
| `docs/release_notes/` | Release-facing notes |
166166

167167
Normative behavior lives in the RFC series first. Current package behavior and usage belong in the language docs. If code and RFCs disagree, treat that as a bug or transition state to resolve explicitly.
168168

docs/language/explanation/dataset_carriers.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -56,11 +56,11 @@ Use `LazyFrame[T]` when you want to compose operations before execution:
5656

5757
```incan
5858
from pub::inql import LazyFrame
59-
from pub::inql.functions import col, gt, int_lit
59+
from pub::inql.functions import col, gt, lit
6060
from models import Order
6161
6262
def high_value_orders(orders: LazyFrame[Order]) -> LazyFrame[Order]:
63-
return orders.filter(gt(col("amount"), int_lit(100)))
63+
return orders.filter(gt(col("amount"), lit(100)))
6464
```
6565

6666
### `DataStream[T]` — streaming
@@ -69,11 +69,11 @@ Use `DataStream[T]` for streaming/unbounded data:
6969

7070
```incan
7171
from pub::inql import DataStream
72-
from pub::inql.functions import col, eq, str_lit
72+
from pub::inql.functions import col, eq, lit
7373
from models import Event
7474
7575
def important_events(events: DataStream[Event]) -> DataStream[Event]:
76-
return events.filter(eq(col("severity"), str_lit("critical")))
76+
return events.filter(eq(col("severity"), lit("critical")))
7777
```
7878

7979
`DataStream[T]` shares the same operation API as batch carriers, but signals that its source is unbounded. Static streaming constraints are specified in RFC 001 and enforced as the compiler gains analysis for `UnboundedDataSet[T]`.
@@ -122,9 +122,9 @@ Current relational authoring is explicit and builder-based. That is deliberate:
122122

123123
Today there are three concrete builder families:
124124

125-
- filters: `eq(...)`, `gt(...)`, `int_lit(...)`, `str_lit(...)`
125+
- filters: `eq(...)`, `gt(...)`, `lit(...)`
126126
- aggregates: `col(...)`, `sum(...)`, `count()`
127-
- projections: `with_column(...)`, `add(...)`, `mul(...)`, `int_expr(...)`, `str_expr(...)`, `bool_expr(...)`
127+
- projections: `with_column(...)`, `add(...)`, `mul(...)`, `lit(...)`
128128

129129
### Aggregate helpers
130130

@@ -148,15 +148,15 @@ That is the current semantic target for future sugar such as `.customer_id` or `
148148
Computed columns now have one real entrypoint: `with_column(name, expr)`.
149149

150150
```incan
151-
from pub::inql.functions import add, col, int_expr, mul
151+
from pub::inql.functions import add, col, lit, mul
152152
from pub::inql import LazyFrame
153153
from models import Order
154154
155155
def enrich_orders(orders: LazyFrame[Order]) -> LazyFrame[Order]:
156156
return (
157157
orders
158-
.with_column("amount_x2", mul(col("amount"), int_expr(2)))
159-
.with_column("amount_plus_one", add(col("amount"), int_expr(1)))
158+
.with_column("amount_x2", mul(col("amount"), lit(2)))
159+
.with_column("amount_plus_one", add(col("amount"), lit(1)))
160160
)
161161
```
162162

@@ -177,14 +177,14 @@ The most useful way to read the current surface is to separate:
177177
This is real current InQL, not aspirational pseudocode:
178178

179179
```incan
180-
from pub::inql.functions import add, col, count, int_expr, sum
180+
from pub::inql.functions import add, col, count, lit, sum
181181
from pub::inql import LazyFrame
182182
from models import Order
183183
184184
def summarize_orders(orders: LazyFrame[Order]) -> LazyFrame[Order]:
185185
grouped = (
186186
orders
187-
.with_column("amount_plus_one", add(col("amount"), int_expr(1)))
187+
.with_column("amount_plus_one", add(col("amount"), lit(1)))
188188
.group_by([col("customer_id")])
189189
.agg([sum(col("amount")), count()])
190190
)

docs/language/explanation/execution_context.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -52,36 +52,36 @@ This is the boundary where deferred relational work becomes local data in hand.
5252

5353
Some convenience APIs are nicer when they do not force the session parameter through every call site. `lazy.collect()` is one of those cases.
5454

55-
That convenience still needs a real execution context underneath, so it resolves through the active session at call time.
55+
That convenience needs a real execution context underneath, so it resolves through the active session at call time.
5656

5757
- `session.activate()` sets the current active session
5858
- `lazy.collect()` uses that active session
5959

6060
If there is no active session, the convenience API fails clearly instead of pretending execution context can be ambient without definition.
6161

62-
## Writing is still Session-owned
62+
## Writing is Session-owned
6363

6464
`session.write_csv(...)` and `session.write_parquet(...)` remain explicit Session methods because writing is not just a carrier concern. It requires binding, execution, and sink ownership.
6565

66-
So the current ergonomic split is:
66+
The ergonomic split is:
6767

6868
- convenience materialization: `lazy.collect()`
6969
- explicit writes: `session.write_csv(...)`, `session.write_parquet(...)`
7070

71-
This is a current package ergonomics choice, not a statement that all future convenience APIs must keep the same shape.
71+
This keeps materialization convenient while leaving sink ownership explicit at the session boundary.
7272

7373
## Typical flow
7474

7575
```incan
76-
from pub::inql import Session
77-
from pub::inql.functions import col, gt, int_expr, int_lit, mul
76+
from pub::inql import LazyFrame, Session
77+
from pub::inql.functions import col, gt, lit, mul
7878
from models import Order
7979
8080
session = Session.default()
8181
82-
orders = session.read_csv[Order]("orders", "orders.csv")?
83-
enriched = orders.with_column("amount_x2", mul(col("amount"), int_expr(2)))
84-
filtered = enriched.filter(gt(col("amount"), int_lit(100))).limit(10)
82+
orders: LazyFrame[Order] = session.read_csv("orders", "orders.csv")?
83+
enriched = orders.with_column("amount_x2", mul(col("amount"), lit(2)))
84+
filtered = enriched.filter(gt(col("amount"), lit(100))).limit(10)
8585
8686
session.activate()
8787
preview = filtered.collect()?
@@ -98,14 +98,14 @@ This pattern is intentionally simple:
9898

9999
For the exact method surface, see [Dataset methods (Reference)](../reference/dataset_methods.md).
100100

101-
## Current limitation
101+
## Materialized carrier shape
102102

103-
`DataFrame[T]` is already the materialized carrier, but its row-level user API is still intentionally narrow. The important current semantic distinction is already in place:
103+
`DataFrame[T]` is the materialized carrier. The important semantic distinction is:
104104

105105
- `LazyFrame[T]` = deferred
106106
- `DataFrame[T]` = local materialized
107107

108-
Today that materialized carrier exposes structured collection metadata first:
108+
The materialized carrier exposes structured collection metadata:
109109

110110
- resolved columns
111111
- row count
Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,25 @@
11
# Aggregate builders (Reference)
22

3-
Current aggregate authoring is explicit and builder-based.
3+
Current aggregate authoring is explicit and scalar-expression-based.
44

55
## Functions
66

7-
| Builder | Signature | Meaning |
8-
| ------- | ----------------------------------------------- | ---------------------------------------------------------------------- |
9-
| `col` | `def col(name: str) -> ColumnExpr` | Column reference builder used by aggregates, filters, and projections. |
10-
| `sum` | `def sum(expr: ColumnExpr) -> AggregateMeasure` | Sum one selected numeric column. |
11-
| `count` | `def count() -> AggregateMeasure` | Count rows in the current relation or group. |
7+
| Builder | Signature | Meaning |
8+
| ------- | ----------------------------------------------------------- | ---------------------------------------------------------------------- |
9+
| `col` | `def col(name: str) -> ColumnExpr` | Column reference builder used by aggregates, filters, and projections. |
10+
| `lit` | `def lit(value: int \| float \| str \| bool) -> ColumnExpr` | Canonical scalar literal helper. |
11+
| `sum` | `def sum(expr: ColumnExpr) -> AggregateMeasure` | Sum one scalar expression. |
12+
| `count` | `def count() -> AggregateMeasure` | Count rows in the current relation or group. |
1213

1314
## Example
1415

1516
```incan
16-
from pub::inql.functions import col, count, sum
17+
from pub::inql.functions import add, col, count, lit, sum
1718
18-
grouped = orders.group_by([col("customer_id")]).agg([sum(col("amount")), count()])
19+
grouped = orders.group_by([col("customer_id")]).agg([sum(add(col("amount"), lit(5))), count()])
1920
```
2021

2122
## Notes
2223

23-
- The current package slice requires explicit `col(...)` builders.
24+
- Aggregate inputs use the same scalar-expression model as filters, projections, and grouping keys.
2425
- Future `.column` sugar and scoped aggregate symbols should lower to this same surface rather than replacing its semantics.
Lines changed: 17 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,34 @@
11
# Filter builders (Reference)
22

3-
Current filter authoring is explicit and builder-based.
3+
Current filter authoring uses the shared scalar-expression builder model.
44

55
## Functions
66

7-
| Builder | Signature | Meaning |
8-
| -------------- | ----------------------------------------------------------------------- | ------------------------------------------------------ |
9-
| `always_true` | `def always_true() -> FilterPredicate` | Trivial predicate; canonical rewrite can eliminate it. |
10-
| `always_false` | `def always_false() -> FilterPredicate` | Predicate that rejects every row. |
11-
| `eq` | `def eq(column: ColumnExpr, literal: FilterLiteral) -> FilterPredicate` | Equality predicate. |
12-
| `gt` | `def gt(column: ColumnExpr, literal: FilterLiteral) -> FilterPredicate` | Greater-than predicate. |
13-
| `int_lit` | `def int_lit(value: int) -> FilterLiteral` | Integer literal for filter predicates. |
14-
| `str_lit` | `def str_lit(value: str) -> FilterLiteral` | String literal for filter predicates. |
15-
| `bool_lit` | `def bool_lit(value: bool) -> FilterLiteral` | Boolean literal for filter predicates. |
7+
| Builder | Signature | Meaning |
8+
| -------------- | ----------------------------------------------------------- | ---------------------------------------------------------------------- |
9+
| `always_true` | `def always_true() -> ColumnExpr` | Trivial boolean scalar expression; canonical rewrite can eliminate it. |
10+
| `always_false` | `def always_false() -> ColumnExpr` | Boolean scalar expression that rejects every row. |
11+
| `eq` | `def eq(left: ColumnExpr, right: ColumnExpr) -> ColumnExpr` | Equality predicate scalar expression. |
12+
| `gt` | `def gt(left: ColumnExpr, right: ColumnExpr) -> ColumnExpr` | Greater-than predicate scalar expression. |
13+
| `lit` | `def lit(value: int \| float \| str \| bool) -> ColumnExpr` | Canonical scalar literal helper. |
14+
| `int_lit` | `def int_lit(value: int) -> ColumnExpr` | Typed integer literal helper. |
15+
| `str_lit` | `def str_lit(value: str) -> ColumnExpr` | Typed string literal helper. |
16+
| `bool_lit` | `def bool_lit(value: bool) -> ColumnExpr` | Typed boolean literal helper. |
1617

1718
## Example
1819

1920
```incan
20-
from pub::inql.functions import col, eq, gt, int_lit, str_lit
21+
from pub::inql.functions import col, eq, gt, lit
2122
2223
filtered = (
2324
orders
24-
.filter(gt(col("amount"), int_lit(100)))
25-
.filter(eq(col("status"), str_lit("open")))
25+
.filter(gt(col("amount"), lit(100)))
26+
.filter(eq(col("status"), lit("open")))
2627
)
2728
```
2829

2930
## Notes
3031

31-
- Filter predicates currently operate on one explicit column builder plus one explicit literal.
32-
- Rich boolean composition is follow-up work.
32+
- Filter predicates are scalar expressions, not a separate predicate-only builder hierarchy.
33+
- The typed `*_lit(...)` helpers construct the same scalar-literal representation as `lit(...)`.
34+
- Boolean composition belongs to the broader scalar-function surface.
Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,21 @@
11
# Projection builders (Reference)
22

3-
Projection builders are the current semantic target for computed columns.
3+
Projection builders are the current semantic target for scalar expressions in computed columns and other row-level positions.
44

55
## Functions
66

77
| Builder | Signature | Meaning |
88
| ------------ | ------------------------------------------------------------ | --------------------------- |
99
| `col` | `def col(name: str) -> ColumnExpr` | Named column reference. |
10+
| `lit` | `def lit(value: int \| float \| str \| bool) -> ColumnExpr` | Canonical scalar literal. |
1011
| `int_expr` | `def int_expr(value: int) -> ColumnExpr` | Integer literal expression. |
1112
| `float_expr` | `def float_expr(value: float) -> ColumnExpr` | Float literal expression. |
1213
| `str_expr` | `def str_expr(value: str) -> ColumnExpr` | String literal expression. |
1314
| `bool_expr` | `def bool_expr(value: bool) -> ColumnExpr` | Boolean literal expression. |
1415
| `add` | `def add(left: ColumnExpr, right: ColumnExpr) -> ColumnExpr` | Binary addition. |
1516
| `mul` | `def mul(left: ColumnExpr, right: ColumnExpr) -> ColumnExpr` | Binary multiplication. |
17+
| `eq` | `def eq(left: ColumnExpr, right: ColumnExpr) -> ColumnExpr` | Equality predicate. |
18+
| `gt` | `def gt(left: ColumnExpr, right: ColumnExpr) -> ColumnExpr` | Greater-than predicate. |
1619

1720
## Dataset entrypoint
1821

@@ -26,17 +29,17 @@ def with_column(self, name: str, expr: ColumnExpr) -> Self
2629
## Example
2730

2831
```incan
29-
from pub::inql.functions import add, col, int_expr, mul
32+
from pub::inql.functions import add, col, lit, mul
3033
3134
projected = (
3235
orders
33-
.with_column("amount_x2", mul(col("amount"), int_expr(2)))
34-
.with_column("amount_plus_one", add(col("amount"), int_expr(1)))
36+
.with_column("amount_x2", mul(col("amount"), lit(2)))
37+
.with_column("amount_plus_one", add(col("amount"), lit(1)))
3538
)
3639
```
3740

38-
## Current limits
41+
## Capability notes
3942

40-
- No argument-bearing `select(...)` yet.
41-
- No query-block projection sugar yet.
42-
- No alias-free symbolic surface like `.amount * 2` yet.
43+
- `with_column(...)` is the explicit computed-column entrypoint.
44+
- Projection-list selection, query-block projection sugar, and alias-free symbolic surfaces lower to this scalar-expression model when exposed.
45+
- The typed literal helpers construct the same scalar-literal representation as `lit(...)`.

0 commit comments

Comments
 (0)