Skip to content

Releases: dataflint/spark

Version 0.9.8

10 May 08:29
705be40

Choose a tag to compare

What's Changed

Bug fixes

  • #74 — Fix NullPointerException in Block.code when from_json (or any other
    CodegenFallback expression) is used under whole-stage codegen with DataFlint
    instrumentation enabled. TimedWithCodegenExec now reports supportCodegen = false
    for wrapped operators that contain a CodegenFallback, mirroring Spark's own
    CollapseCodegenStages check that the transparent wrapper had been hiding.
  • Compatible with Spark 3.0 → 4.x (uses TreeNode.find instead of the 3.2-only
    TreeNode.exists).

Hardening

  • TimedExec.postRddId now overwrites the rddId metric instead of summing across
    re-executions of the same plan instance.
  • TimedExec and TimedWithCodegenExec no longer compare equal — fixes a corner
    case in plan canonicalization / AQE plan reuse.
  • executeCollect write-path is now bounds-safe; falls back to the standard path
    on unexpected plan shapes (vendor write commands, future Spark layouts).
  • rddId metric switched from a "size" type to plain sum (no longer rendered as
    bytes — "12 B" — in the SparkUI).

Tests / CI

  • New DataFlintCodegenFallbackSpec regression test for #74.
  • pluginspark4 now runs version-portable specs against Spark 4.0.1 in CI.
  • CI bumped to Java 17 (required by Spark 4); published jars stay Java 8 compatible.

Full Changelog: v0.9.7...v0.9.8

Version 0.9.7

07 May 17:10
dcd00ba

Choose a tag to compare

What's Changed

  • DATAFLINT-5041: dataflint-spark4-databricks shaded artifact for DBR 17.3+ by @minskya in #73

Full Changelog: v0.9.6...v0.9.7

Version 0.9.6

06 May 15:59
ff1afe3

Choose a tag to compare

What's Changed

  • merged pytests logic for finding plugin jar. by @minskya in #70
  • skip dataflint UI on databricks spark 4 to avoid jakarta.servlet crash by @minskya in #71

Full Changelog: v0.9.5...v0.9.6

Version 0.9.5

28 Apr 15:32
db14cb4

Choose a tag to compare

What's Changed

  • Update README version and fix codegen/SQL API issues by @minskya in #69

Full Changelog: v0.9.4...v0.9.5

Version 0.9.4

28 Apr 07:45
a9dca6f

Choose a tag to compare

What's Changed

  • Support Spark 3.0/3.1 instrumentation with non-transparent TimedExec wrapper by @minskya in #68

Full Changelog: v0.9.3...v0.9.4

Version 0.9.3

27 Apr 05:43
2739300

Choose a tag to compare

What's Changed

  • sqlNodes instrumentation for spark<3.2 is disabled. by @minskya in #67

Full Changelog: v0.9.2...v0.9.3

Version 0.9.2

23 Apr 18:19
0f370d7

Choose a tag to compare

What's Changed

  • [DATAFLINT-4798] problem with codegen support on spark<3.2 by @minskya in #66

Full Changelog: v0.9.1...v0.9.2

Version 0.9.1

23 Apr 17:20
4990d5d

Choose a tag to compare

What's Changed

  • [DATAFLINT-4798] Fix history server support for older Spark apps (pre-3.2) by @minskya in #65

Full Changelog: v0.9.0...v0.9.1

Version 0.9.0

21 Apr 07:24
b161400

Choose a tag to compare

What's Changed

  • Fix duration metric accuracy: RDD timing, codegen try/finally, per-partition write timing by @minskya in #63
  • Footer improvements, duration fix, and version bump to 0.9.0 by @minskya in #64

Full Changelog: v0.8.9...v0.9.0

Version 0.8.9

16 Apr 17:30
c8f3730

Choose a tag to compare

What's Changed

  • Add TPC-DS benchmark CI workflow with DataFlint plugin verification by @minskya in #60
  • Claude/fix tpcds ci errors by @minskya in #61
  • [DATAFLINT-4425] - duration instrumentation on all nodes by @minskya in #58

Full Changelog: v0.8.8...v0.8.9