Skip to content

Optimization: Reduction Pushdown #235

@wangandi

Description

@wangandi

Elevator pitch

  1. What does the transformation do? Give a representative SQL query.

Convert a reduce on a join to a join on one or more reduced inputs.

  1. Why should we add it?

This can:

  • reduce the number of rows that the join produces.
  • eliminate skew.
  • reduce the number of diffs that the join operator receives.
  1. When would it be good to have?

When reducing an input of the join significantly decreases the number of rows that the input has.

  1. When would it be ineffectual?

When reducing an input of the join does not significantly decrease the number of rows.

  1. When would be bad to have?

When the join condition would filter out most of the rows coming from an input of the join.

  1. In the worst case, how would it degrade performance?

There would be unnecessary memory and CPU overhead.

List real life instances where this transformation would help.

No response

Cost Model

  1. What is the benefit of the transformation?
  2. What is the overhead?
  3. When would the transformation be worthwhile? Intuitively, this should
    be when benefit > overhead, but sometimes a benefit with regards to X
    comes at a cost with regards to Y, and it would be worthwhile to discuss
    when it is worthwhile to sacrifice Y to gain a benefit in X.

List any knobs that we may need to tune or expose to the user.

No response

Proposed implementation

  1. Describe the implementation.
  2. Which queries will do better with the given implementation?
  3. Which queries will do worse?
  4. Break the implementation down into stages.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions