Skip to content

Support Spark 3.0/3.1 instrumentation with non-transparent TimedExec wrapper#68

Merged
minskya merged 4 commits into
mainfrom
DATAFLINT-4840
Apr 28, 2026
Merged

Support Spark 3.0/3.1 instrumentation with non-transparent TimedExec wrapper#68
minskya merged 4 commits into
mainfrom
DATAFLINT-4840

Conversation

@minskya
Copy link
Copy Markdown
Contributor

@minskya minskya commented Apr 27, 2026

Summary

  • TimedExec non-transparent wrapper for Spark 3.0/3.1: The transparent wrapper pattern (children = child.children) is incompatible with Spark 3.0/3.1's withNewChildren (uses mapProductIterator + containsChild). On legacy Spark, switches to children = Seq(child) so plan transformations (CollapseCodegenStages, ApplyColumnarRulesAndInsertTransitions, AQE) work correctly.
  • TimedWithCodegenExec: Split codegen support into a separate class so non-CodegenSupport nodes (e.g. FileSourceScanExec on Spark 3.1) are wrapped without codegen, avoiding ClassCastException.
  • Re-enabled SQL node instrumentation on all Spark versions (was disabled for 3.0/3.1).
  • UI node merge: On legacy Spark the non-transparent wrapper creates duplicate nodes in the plan graph. The UI now merges these pairs — keeps the child node (which has rich plan descriptions like filter conditions, window expressions) and adds the wrapper's duration metric.
  • Preserved parsedPlan on repeated polls: The non-paginated /sql API (used on Spark < 3.2) returns all SQLs every poll cycle, causing repeated recalculations that lost plan descriptions when sqlplan offset advanced. Now preserves existing plan data when unavailable.
  • Fix function name extraction for Python nodes: WindowInPandas/ArrowWindowPython plan descriptions weren't parsed (regex required Window [ prefix). MapInPandas, FlatMapGroupsInPandas, and FlatMapCoGroupsInPandas use a bare function format that the bracket-based parser couldn't handle. Both fixed with fallback regex patterns.
  • PySpark test: Added columnar scan + Python UDF + shuffle test case.

Test plan

  • All existing unit tests pass (43/43, 1 pre-existing flaky timing test)
  • New parser tests for MapInPandas, FlatMapGroupsInPandas, FlatMapCoGroupsInPandas
  • Manual test with Spark 3.1.2: verify instrumented nodes show duration metrics
  • Manual test with Spark 3.1.2: verify filter conditions and window expressions visible in UI
  • Manual test with Spark 3.5.x: verify no regression (transparent wrapper still used)
  • Run pyspark-testing/dataflint_pyspark_example.py on Spark 3.1.2

🤖 Generated with Claude Code

…wrapper

TimedExec's transparent wrapper (children = child.children) is incompatible
with Spark 3.0/3.1's withNewChildren which uses mapProductIterator + containsChild.
This caused CollapseCodegenStages, ApplyColumnarRulesAndInsertTransitions, and AQE
to silently fail to update children through the wrapper.

Changes:
- TimedExec: detect Spark < 3.2 via SPARK_VERSION and use non-transparent
  children = Seq(child) on legacy Spark, transparent on 3.2+
- TimedWithCodegenExec: split codegen support into separate class so
  non-codegen nodes (e.g. FileSourceScanExec on 3.1) don't get cast errors
- DataFlintInstrumentationExtension: re-enable SQL node instrumentation on
  all Spark versions (was disabled for 3.0/3.1)
- SqlReducer: merge duplicate wrapper+child node pairs in the UI by keeping
  the child node (rich plan descriptions) and adding wrapper's duration metric
- SqlReducer: preserve parsedPlan when plan data unavailable on repeated polls
  (non-paginated SQL API returns all SQLs every cycle)
- PySpark test: add columnar scan + Python UDF + shuffle test case

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@notion-workspace
Copy link
Copy Markdown

@minskya minskya changed the title Support Spark 3.0/3.1 instrumentation with non-transparent TimedExec … Support Spark 3.0/3.1 instrumentation with non-transparent TimedExec wrapper Apr 27, 2026
minskya and others added 3 commits April 28, 2026 09:59
…tMap nodes

- WindowParser: regex now matches WindowInPandas/ArrowWindowPython plan
  descriptions (was hardcoded to "Window [" prefix)
- batchEvalPythonParser: fallback regex for bare function names used by
  MapInPandas, FlatMapGroupsInPandas, and FlatMapCoGroupsInPandas
  (these don't use the [funcs], [udfs] bracket format)
- SqlReducer: reverted unnecessary WindowInPandas split (parser fix
  makes it redundant)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The bracket regex [funcs], [udfs] was matching FlatMapCoGroupsInPandas's
[group_keys], [group_keys] as function/UDF lists. Now requires the first
bracket to contain parentheses (function calls) or be empty, so CoGroups
falls through to the bare function name regex.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@minskya minskya merged commit a9dca6f into main Apr 28, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant