Skip to content

Assertion failure in Join when ancestors in other domains #146

@JustusAdam

Description

@JustusAdam

Setup

I am trying to run a query that computes an average. The graph and the operators are generated from a different language by a compiler, but in SQL it would look something like this

SELECT sum(x) / count(*)
FROM Tab

Error

The query itself runs fine, but I wanted to test how the performance would be if count(*) and sum(x) were computed on different domains. So I hacked into assignment to force these operators on their own domains.

When I do that however the join after the two calculations tries to access a non existent index in its right ancestor. I expanded the error message (see below) which says that the right ancestor with id 4 was short (tries accessing index 2 in the other slice, which only has two elements, in the generate_row function.

This is the error message for the two domains case, in the case of four domains its the same but the id is different (because more generated ingress/egress operators)

'right (4) was short', noria-server/dataflow/src/ops/join.rs:181:21

Questions

Is there something i am missing about domains? Can I not just make any operator into its own domain? Are there any invariants around what can go on a domain and what cant?

Runtime graphs

Here are the dot graphs for two domains and four domains and for good measure the original (working) singe domain.

The relevant operators here are ohua.generated/op_s_acc_0_0 (count(*)) and ohua.generated/op_s_acc_1_0 (sum(x)) and the join afterwards. (The rest is just generated code that does some column renaming)

How to reproduce

I uploaded a branch (join-after-domain-error-reproduction) to my fork that should contain the complete state of the system necessary (including generated operators) to reproduce the error.

In the udf-benchmarks directory run cargo run --bin features avg-split-domain/two-domainsf.toml

This will run the two domain scenario. For one or four use the one-domain.toml and four-domains.toml config respectively

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions