Skip to content

Conversation

@cfmcgrady
Copy link
Contributor

What changes were proposed in this pull request?

This PR moves the numOutputRows metric update logic from the val rdd initialization block to the doExecute() method in OneRowRelationExec.

Why are the changes needed?

Currently, the numOutputRows metric in OneRowRelationExec is incorrectly incremented twice in the codegen codebase (displaying 2 instead of 1 for a single row).

before this PR:

企业微信截图_70a269d7-7e34-40a4-a0f4-abeae7283a79

after this PR:

企业微信截图_d7fab096-024e-440b-94e1-707296258c55

Does this PR introduce any user-facing change?

No, only bug fix.

How was this patch tested?

Added UT.

Was this patch authored or co-authored using generative AI tooling?

@github-actions github-actions bot added the SQL label Dec 18, 2025
@cfmcgrady cfmcgrady changed the title [SPARK-52060] Fix incorrect numOutputRows metric in OneRowRelationExec [SPARK-52060][SQL] Fix incorrect numOutputRows metric in OneRowRelationExec Dec 18, 2025
@cfmcgrady
Copy link
Contributor Author

the bug was introduced in #50849

@cfmcgrady
Copy link
Contributor Author

cc @cloud-fan @richardc-db

@cfmcgrady
Copy link
Contributor Author

Also cc @pan3793 for finding this bug while adding Spark 4.1.0 support for Kyuubi in apache/kyuubi#7254 .

Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @cfmcgrady, thanks for fixing it quickly!

@cfmcgrady cfmcgrady changed the title [SPARK-52060][SQL] Fix incorrect numOutputRows metric in OneRowRelationExec [SPARK-54749][SQL] Fix incorrect numOutputRows metric in OneRowRelationExec Dec 18, 2025
@cloud-fan
Copy link
Contributor

the pyspark failure is unrelated, thanks, merging to master/4.1!

@cloud-fan cloud-fan closed this in a26c209 Dec 18, 2025
cloud-fan pushed a commit that referenced this pull request Dec 18, 2025
…onExec

### What changes were proposed in this pull request?

This PR moves the `numOutputRows` metric update logic from the val rdd initialization block to the `doExecute()` method in `OneRowRelationExec`.

### Why are the changes needed?

Currently, the `numOutputRows` metric in `OneRowRelationExec` is incorrectly incremented twice in the codegen codebase (displaying 2 instead of 1 for a single row).

before this PR:

<img width="251" height="318" alt="企业微信截图_70a269d7-7e34-40a4-a0f4-abeae7283a79" src="https://github.com/user-attachments/assets/4a376f45-e89d-4fa6-8ba9-a564663467cf" />

after this PR:

<img width="242" height="314" alt="企业微信截图_d7fab096-024e-440b-94e1-707296258c55" src="https://github.com/user-attachments/assets/036de957-9ba9-4b7d-a89e-ed737d23eae2" />

### Does this PR introduce _any_ user-facing change?

No, only bug fix.

### How was this patch tested?

Added UT.

### Was this patch authored or co-authored using generative AI tooling?

Closes #53520 from cfmcgrady/fix-onerowrelation-metrics.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit a26c209)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants