Setup
I am trying to run a query that computes an average. The graph and the operators are generated from a different language by a compiler, but in SQL it would look something like this
SELECT sum(x) / count(*)
FROM Tab
Error
The query itself runs fine, but I wanted to test how the performance would be if count(*) and sum(x) were computed on different domains. So I hacked into assignment to force these operators on their own domains.
When I do that however the join after the two calculations tries to access a non existent index in its right ancestor. I expanded the error message (see below) which says that the right ancestor with id 4 was short (tries accessing index 2 in the other slice, which only has two elements, in the generate_row function.
This is the error message for the two domains case, in the case of four domains its the same but the id is different (because more generated ingress/egress operators)
'right (4) was short', noria-server/dataflow/src/ops/join.rs:181:21
Questions
Is there something i am missing about domains? Can I not just make any operator into its own domain? Are there any invariants around what can go on a domain and what cant?
Runtime graphs
Here are the dot graphs for two domains and four domains and for good measure the original (working) singe domain.
The relevant operators here are ohua.generated/op_s_acc_0_0 (count(*)) and ohua.generated/op_s_acc_1_0 (sum(x)) and the join afterwards. (The rest is just generated code that does some column renaming)
How to reproduce
I uploaded a branch (join-after-domain-error-reproduction) to my fork that should contain the complete state of the system necessary (including generated operators) to reproduce the error.
In the udf-benchmarks directory run cargo run --bin features avg-split-domain/two-domainsf.toml
This will run the two domain scenario. For one or four use the one-domain.toml and four-domains.toml config respectively
Setup
I am trying to run a query that computes an average. The graph and the operators are generated from a different language by a compiler, but in SQL it would look something like this
Error
The query itself runs fine, but I wanted to test how the performance would be if
count(*)andsum(x)were computed on different domains. So I hacked intoassignmentto force these operators on their own domains.When I do that however the join after the two calculations tries to access a non existent index in its right ancestor. I expanded the error message (see below) which says that the right ancestor with id
4was short (tries accessing index2in theotherslice, which only has two elements, in thegenerate_rowfunction.This is the error message for the two domains case, in the case of four domains its the same but the id is different (because more generated ingress/egress operators)
Questions
Is there something i am missing about domains? Can I not just make any operator into its own domain? Are there any invariants around what can go on a domain and what cant?
Runtime graphs
Here are the dot graphs for two domains and four domains and for good measure the original (working) singe domain.
The relevant operators here are
ohua.generated/op_s_acc_0_0(count(*)) andohua.generated/op_s_acc_1_0(sum(x)) and thejoinafterwards. (The rest is just generated code that does some column renaming)How to reproduce
I uploaded a branch (
join-after-domain-error-reproduction) to my fork that should contain the complete state of the system necessary (including generated operators) to reproduce the error.In the
udf-benchmarksdirectory runcargo run --bin features avg-split-domain/two-domainsf.tomlThis will run the two domain scenario. For one or four use the
one-domain.tomlandfour-domains.tomlconfig respectively