Skip to content

Spark: Async micro batch planner Clean History#15223

Draft
RjLi13 wants to merge 17 commits intoapache:mainfrom
RjLi13:async-micro-batch-planner-clean
Draft

Spark: Async micro batch planner Clean History#15223
RjLi13 wants to merge 17 commits intoapache:mainfrom
RjLi13:async-micro-batch-planner-clean

Conversation

@RjLi13
Copy link

@RjLi13 RjLi13 commented Feb 3, 2026

This is a PR to try to rewrite the history of Async Micro Batch Planner feature to make review easier. Each commit is separated to showcase the flow

  1. SparkMicroBatchStream -> SyncSparkMicroBatchPlanner (this relocates the logic planning does to a new class)
  2. Migrate duplicated code and circular deps to MicroBatchUtils and BaseSparkMicroBatchPlanner
  3. Strip out code from SparkMicroBatchStream to leverage planner and microbatchutils, it becomes entry point for planners.
  4. Restore all code to pr state to show any unnecessary changes and not deviate from what is reviewed

Note I created a new commit history that deviates from the original commit history. Therefore some of the review comments were merged in to make it a little cleaner but doesn't showcase the original review process. To ensure the files are same as the PR, i used git checkout origin/async-micro-batch-planner-spark-3-5 -- <file name> to ensure no changes.

Also AsyncMicroBatchPlanner is the biggest and newest change here, with a background thread to put planned files into the queue to read. It borrows some elements of SparkMicroBatchStream, but is totally new in implementation.

As always, credits go to Drew Goya who authored the original feature here at Netflix.

@github-actions github-actions bot added the spark label Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant