feat(RFC): A richer Expr IR#2572
Conversation
|
- Mentioned in (#2391 (comment)) - Needed again for #2572
at the moment it looks like this adds a self-standing |
* chore(typing): Add `_typing_compat.py` - Mentioned in (#2391 (comment)) - Needed again for #2572 * refactor: Reuse `TypeVar` import * refactor: Reuse `@deprecated` import * refactor: Reuse `Protocol38` import * docs: Add module-level docstring
Still need: - reprs - fix the hierarchy issue (#2572 (comment)) - Flag summing (#2572 (comment))
- 1 step closer to the understanding for (#2572 (comment)) - There's still some magic going on when `polars` serializes - Need to track down where `'collect_groups': 'ElementWise'` and `'collect_groups': 'GroupWise'` first appear - Seems like the flags get reduced
|
Thanks for peeking @MarcoGorelli
That is definitely the eventual goal! 🤞 Despite how quickly things have progressed, I still feel I'm a few steps behind being ready for that just yet. General overviewI'm trying to focus on modeling these structures and how they interact:
My thought was that So like what I have in Current
|
This comment was marked as resolved.
This comment was marked as resolved.
Can't tell if this means `FirstT` will match the entry `firstt`, but preserve the `firstt` fix (https://github.com/codespell-project/codespell#ignoring-words) (#2572 (comment))
|
I should've expected this, but it was a nice suprise to find we get hashable selectors for free 😄 from narwhals._plan import selectors as ndcs
>>> ndcs.matches("[^z]a")._ir == ndcs.matches("[^z]a")._ir
True
>>> ndcs.matches("[^z]a")._ir == ndcs.matches("abc")._ir
False@MarcoGorelli regarding (#2291) from narwhals._plan import selectors as ndcs
>>> ndcs.all()._ir == ndcs.all()._ir
True
lhs = ndcs.all()
rhs = ndcs.all().mean()
>>> lhs._ir == rhs._ir
False
>>> lhs._ir == rhs._ir.expr
TrueAnd the same holds for the non-selectors from narwhals._plan import demo as nwd
lhs = nwd.all()
rhs = nwd.all().mean()
>>> lhs._ir == rhs._ir
False
>>> lhs._ir == rhs._ir.expr
True
>>> type(rhs._ir)
narwhals._plan.aggregation.Mean |
An experiment towards (#2572 (comment))
| def test_valid_windows() -> None: | ||
| """Was planning to test this matched, but we seem to allow elementwise horizontal? | ||
|
|
||
| https://github.com/narwhals-dev/narwhals/blob/63c8e4771a1df4e0bfeea5559c303a4a447d5cc2/tests/expression_parsing_test.py#L10-L45 | ||
| """ | ||
| ELEMENTWISE_ERR = re.compile(r"cannot use.+over.+elementwise", re.IGNORECASE) # noqa: N806 | ||
| a = nwd.col("a") | ||
| assert a.cum_sum() | ||
| assert a.cum_sum().over(order_by="id") | ||
| with pytest.raises(InvalidOperationError, match=ELEMENTWISE_ERR): | ||
| assert a.cum_sum().abs().over(order_by="id") | ||
|
|
||
| assert (a.cum_sum() + 1).over(order_by="id") | ||
| assert a.cum_sum().cum_sum().over(order_by="id") | ||
| assert a.cum_sum().cum_sum() | ||
| assert nwd.sum_horizontal(a, a.cum_sum()) | ||
| with pytest.raises(InvalidOperationError, match=ELEMENTWISE_ERR): | ||
| assert nwd.sum_horizontal(a, a.cum_sum()).over(order_by="a") | ||
|
|
||
| assert nwd.sum_horizontal(a, a.cum_sum().over(order_by="i")) | ||
| assert nwd.sum_horizontal(a.diff(), a.cum_sum().over(order_by="i")) | ||
| with pytest.raises(InvalidOperationError, match=ELEMENTWISE_ERR): | ||
| assert nwd.sum_horizontal(a.diff(), a.cum_sum()).over(order_by="i") | ||
|
|
||
| with pytest.raises(InvalidOperationError, match=ELEMENTWISE_ERR): | ||
| assert nwd.sum_horizontal(a.diff().abs(), a.cum_sum()).over(order_by="i") |
There was a problem hiding this comment.
@MarcoGorelli quick question
This is adapted from an existing test:
tests.expression_parsing_test.test_window_kind
narwhals/tests/expression_parsing_test.py
Lines 10 to 45 in 63c8e47
AFAICT, all of the expressions I've needed a InvalidOperationError for shouldn't be valid.
But they aren't raising in current narwhals 🤔
1
import narwhals as nw
a = nw.col("a")
a.cum_sum().abs().over(order_by="id")This error explicitly mentions abs
narwhals/narwhals/_expression_parsing.py
Lines 357 to 362 in 9bd10ad
2, 3, 4
These are all raising the same as (1), but the issue seems to be that horizontal functions aren't being treated as elementwise
import narwhals as nw
a = nw.col("a")
nw.sum_horizontal(a, a.cum_sum()).over(order_by="a")
nw.sum_horizontal(a.diff(), a.cum_sum()).over(order_by="i")
nw.sum_horizontal(a.diff().abs(), a.cum_sum()).over(order_by="i")In polars, they all seem to be elementwise but with an additional flag
I've done the same in this PR, but I don't think that flag would factor into this?
narwhals/narwhals/_plan/functions.py
Lines 291 to 299 in 9bd10ad
`lit(Series)` was really bugging me with how it should be showing the backend
commit 9d015e2031acd5e10404da72a7d698c632446c49
Author: Marco Edward Gorelli <33491632+MarcoGorelli@users.noreply.github.com>
Date: Sat May 16 09:51:39 2026 +0100
release: Bump version to 2.21.2 (#3630)
commit 8674e44f657110b45aedffb8c1343d5b0505bb01
Author: Marco Edward Gorelli <33491632+MarcoGorelli@users.noreply.github.com>
Date: Sat May 16 09:37:56 2026 +0100
release: Bump version to 2.21.1 (#3628)
commit 739dc579adfd38a75764d6d2c09ed42b0f2e887a
Author: Marco Edward Gorelli <33491632+MarcoGorelli@users.noreply.github.com>
Date: Sat May 16 09:35:33 2026 +0100
ci: remove `downstream_tests_slow` (#3629)
commit 1f36279bb07203628ddd18a691cf98d4c78780d3
Author: Marco Edward Gorelli <33491632+MarcoGorelli@users.noreply.github.com>
Date: Sat May 16 09:17:10 2026 +0100
Revert "fix: Allow `float('nan')` as value in join for duckdb (#3555)" (#3627)
This reverts commit 0d7f352.
commit 1e50d020e3ca5e5c5dfe3486e53190e87c60b66c
Author: Pedro <pedro.villanueva@booking.com>
Date: Fri May 15 13:05:01 2026 +0200
[Enh]: Add the negation unary operator for expressions and series (#3625)
negation unary operator for expressions and series
commit 37ea7953f6b455736bf6ae353ca257f468b3b083
Author: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com>
Date: Wed May 13 10:38:42 2026 +0200
ci: Unpin "temporary" CI pins (#3618)
* ci: Unpin 'temporary' CI pins
* rollback formulaic and pointblank
commit 4ff3a1f545924e5e38e76c4d8a71de2010a2f4d6
Author: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com>
Date: Tue May 12 19:11:11 2026 +0200
chore: Prepare for future pandas inplace deprecation (#3616)
This reverts commit 5dee22d.
Accidentally untracked all of these lol
That move destroyed the history
for more information, see https://pre-commit.ci
Will close #2571
What type of PR is this? (check all applicable)
Related issues
Exprinternal representation #2571(sort:updated-desc "(expr-ir)/" in:title)Checklist
If you have comments or can explain your changes, please do so below
Important
See (#2571) for detail!!!!!!!
Very open to feedback
Tasks
Show 2025 May-July
pl.Expr.metapl.Expr.meta)ExprIRmetamethods_typing_compatmodule #2578Merge another PR with (perf: Avoid module-levelimportlib.util.find_spec#2391 (comment)) firstTypeVardefaults moreTypeVar("T", bound=Thing, default=Thing), instead of an opaqueExprIRSelector(s)narwhals/narwhals/_plan/expr.py
Lines 336 to 337 in 0bada48
BinaryExprthat describes the restricted set of operators that are allowedpolars, since they wrappl.colinternallyIntoExprin more places (including and beyond whatnarwhalsallows now)demo.py*_horizontalconcat_str)dummy.pyover,sort_by)FunctionOptions+ friends see commentWhere does the{flags: ...}->{collect_groups: ..., flags: ...}expansion happen?polars>=1.3.0fixed the issue (see comment)Ternarywhen-then-otherwise🥳)Metais_*,has_*,output_namemeta methodsroot_namesundo_aliases,popNamename.py(Expr::KeepName,Expr::RenameAlias)polarswill help with themetamethodsCat,Struct,List(a3e29d1)String(72c33ce)DateTime(aee0a7e)_expression_parsing.pyrulesrustversion worksExpansionFlagsexpand_function_inputsrewrite_projectionsreplace_selectorexpand_selectorreplace_selector_innerreplace_and_add_to_resultsreplace_nthprepare_excludedexpand_columnsexpand_dtypesreplace_dtype_or_index_with_columndtypes_match(probably can solve w/ existingnarwhals)expand_indicesreplace_index_with_columnreplace_wildcardrewrite_special_aliasesreplace_wildcard_with_columnreplace_regexexpand_regexExprIR.map_irExprIR #2572 (comment))ExprIR.map_irfor most nodesWindowExpr.map_irFunctionExpr.map_irRollingExpr,AnonymousExprinheritselectorsExprIR(main) #3066ExprIR(main) #3066 (comment))_planpackage #3122group_by, utilizepyarrow.acero#3143{Expr,Series}.{first,last}#2528)protocols.py#3166order_by,hashjoin,DataFrame.{filter,join},Expr.is_{first,last}_distinct#3173__dict__appearing onImmutablesubclasses (thread)__slots__, and not__dict__too #3201ExpansionFlags.from_ir#3206ColumnNameOrSelectorcan be addedExpr.meta.serialize,Expr.deserializepc.Expressionover(*partition_by)#3224rank,with_row_index_by,over(*partition_by, order_by=...)#3295Selectoroverhaul #3233date_range, support{date,int}_range(eager=...)#3294ArrowExpr#3325dt.*,str.to_date(time)dt.*str.to_date(time)ExprIRwith in sync withnarwhalslist.*aggregate methods #3353list.sort#3359{Expr,Series}.any_value#3315overwith currentpolars#3352ArrowExprboilerplatearrow.functionsa package #3384ArrowListNamespace-> (ArrowExprListNamespace,ArrowScalarListNamespace)len,getListScalarEager{DataFrame,Namespace}when thenarwhals-level depends on themconcatcommentSelectors that do more than select #3376LogicalPlan🚀over(*partition_by)windows between expressions in the same contextmaincan do this for multi-output (by not expanding)LogicalPland3fc48a1a0b9291ae6bac21af27e1039e63ba3cf/python/cudf_polars/cudf_polars/dsl/ir.py)
DataTypepropagation and supertyping rulesconcat(..., how={"vertical_relaxed", "diagonal_relaxed"})#3386LogicalPlansnarwhalsbehavior, but likely more of an issue whenSelectors can be used in more placesoh-nodestopolarsinternalsExpr) features forLogicalPlanconcat#3371read_{csv,parquet}#3370DataFrame.write_{csv,parquet}#3369BaseFrameDataFrame.join_asof#3378BaseFrame.unpivot#3368BaseFrame.unnest#3414DataFrameDataFrame.pivot#3373DataFrame.unique#3364select#3647