feat(flink): Implement continuous sorting feature for append write by prashantwason · Pull Request #18083 · apache/hudi

prashantwason · 2026-02-04T21:43:09Z

Describe the issue this Pull Request addresses

This PR implements a continuous sorting feature for the Flink append write sink. The continuous sorting mode maintains sorted order incrementally using a TreeMap, which avoids large pause times from batch sorting and reduces single-partition lag during ingestion by minimizing backpressure.

Summary and Changelog

Summary: Added a continuous sorting mode (AppendWriteFunctionWithContinuousSort) that provides non-blocking O(log n) inserts and incremental draining, offering predictable latency without sort spikes.

Changelog:

Added AppendWriteFunctionWithContinuousSort class that keeps records in a TreeMap keyed by a code-generated normalized key and an insertion sequence
When buffer reaches max capacity, oldest entries are drained and written immediately
Updated AppendWriteFunctions.create to instantiate the continuous sorter when WRITE_BUFFER_SORT_CONTINUOUS_ENABLED is true
Introduced new FlinkOptions:
- write.buffer.sort.continuous.enabled - Whether to use continuous sorting instead of batch sorting
- write.buffer.sort.continuous.drain.size - Number of records to drain each time max capacity is reached
Added ITTestAppendWriteFunctionWithContinuousSort integration tests covering buffer flush triggers, sorted output correctness, drain behaviors, and invalid-parameter error cases

Impact

New Configuration Options:

write.buffer.sort.continuous.enabled (default: false) - Enables continuous sorting mode
write.buffer.sort.continuous.drain.size (default: 1) - Controls drain batch size

No breaking changes to existing functionality. The feature is disabled by default.

Risk Level

low - This is a new optional feature that is disabled by default. Existing behavior is unchanged unless the user explicitly enables continuous sorting.

Documentation Update

The config description is included in the code. Documentation update for the Hudi website may be needed to describe the new continuous sorting feature and its configuration options.

Contributor's checklist

Read through contributor's guide
Enough context is provided in the sections above
Adequate tests were added if applicable

… sorted order incrementally and avoid single-partition lag during ingestion by reducing large pause time from sort and backpressure Summary: - Added AppendWriteFunctionWithContinuousSort which keeps records in a TreeMap keyed by a code-generated normalized key and an insertion sequence, drains oldest entries when a configurable threshold is reached, and writes drained records immediately; snapshot/endInput drain remaining records. - Updated AppendWriteFunctions.create to instantiate the continuous sorter when WRITE_BUFFER_SORT_CONTINUOUS_ENABLED is true. - Introduced three new FlinkOptions: WRITE_BUFFER_SORT_CONTINUOUS_ENABLED, WRITE_BUFFER_SORT_CONTINUOUS_DRAIN_THRESHOLD_PERCENT, and WRITE_BUFFER_SORT_CONTINUOUS_DRAIN_SIZE, and added runtime validation (buffer > 0, 0 < threshold < 100, drainSize > 0, parsed non-empty sort keys). - Added ITTestAppendWriteFunctionWithContinuousSort integration tests covering buffer flush triggers, sorted output correctness (with and without continuous drain), drain threshold/size behaviors, and invalid-parameter error cases. Verified that we can push higher CPU utilization and input rate with continuous sorting enabled (See before and after 11:30AM)

prashantwason · 2026-02-06T02:12:36Z

@hudi-bot run azure

cshuo · 2026-02-06T02:39:04Z

@prashantwason thks for the contribution.

A high-level question before reviewing the details, I noticed this PR doesn't support async write like #13892, and actually the time cost of flushing records to file will be greater than sorting. Do you have any benchmark number about perf comparison of this PR and the previous version.

prashantwason · 2026-02-12T02:21:59Z

@hudi-bot run azure

prashantwason · 2026-02-24T06:49:42Z

Thanks @cshuo for the review!

Regarding async write support:
You're correct that this PR doesn't include async write like #13892. The goal of this PR is more focused - it specifically addresses the sorting bottleneck in append write by switching from batch sorting to continuous (incremental) sorting. This eliminates the large pause times that occur when sorting a full buffer.

The key benefits of continuous sorting:

Non-blocking O(log n) inserts vs O(n log n) batch sort at flush time
Predictable latency - no sort spikes when buffer fills up
Incremental draining - oldest entries are drained and written immediately when buffer reaches max capacity
Reduced backpressure - minimizes single-partition lag during ingestion

Regarding benchmarks:
I don't have formal benchmark numbers comparing to the existing batch sort approach. The primary motivation was to eliminate the unpredictable latency spikes from batch sorting rather than overall throughput improvement. That said, adding async write on top of continuous sorting would be a natural extension that could improve throughput further.

Would you like me to add some basic benchmarking to quantify the latency improvements? Or do you have specific concerns about the approach that I should address first?

@hudi-bot run azure

cshuo · 2026-02-24T07:52:38Z

@prashantwason thks for the detail explanation.

Would you like me to add some basic benchmarking to quantify the latency improvements? Or do you have specific concerns about the approach that I should address first?

Regarding the append write with buffer sorting, there are already two approaches, AppendWriteFunctionWithBIMBufferSort and AppendWriteFunctionWithDisruptorBufferSort in the repository. Both implement sorting and flushing in an asynchronous manner, aiming to mitigate spike issues during the flush process.

This PR adopts a continuous sorting approach, which can indeed alleviate sorting spikes. But the flush process involves not only sorting but also file-writing overhead and continuous sort will spread sorting overhead across each record writing process. Therefore, I’d like to understand how much performance gain this PR will bring compared to the existed functions.

nsivabalan · 2026-03-10T01:11:14Z

hey @prashantwason : can you respond to @cshuo on this asks.

- Move config validation before super.open() to avoid IllegalStateException when Flink runtime context is not initialized (fixes 5 validation tests) - Add RecordComparator for full comparison fallback when normalized keys are equal, ensuring correct multi-field sort order (fixes 2 sort tests) - Store RowData reference in SortKey for RecordComparator access

prashantwason · 2026-03-10T09:39:50Z

@cshuo Thanks for the thoughtful follow-up. You raise a valid point about the existing AppendWriteFunctionWithBIMBufferSort and AppendWriteFunctionWithDisruptorBufferSort.

Here's how continuous sorting differs from the existing async approaches:

Existing approaches (BIM/Disruptor):

Batch sort (O(n log n)) at flush time, but move the sort+write to a background thread
Require double-buffering (2x memory) or a ring buffer
Add threading complexity (synchronization, error propagation, buffer swaps)
Sorting still happens as a single O(n log n) burst, just on a different thread

Continuous sorting (this PR):

O(log n) per insert, no batch sort at all
Single buffer (no double-buffering overhead)
No threading complexity — simpler to reason about and debug
Incremental draining — when buffer fills, oldest sorted records are written immediately
Better for latency-sensitive workloads where predictable per-record cost is preferred over throughput

The key trade-off is:

BIM/Disruptor optimize throughput by overlapping sort+write with ingestion (async), at the cost of memory and complexity
Continuous sort optimizes latency predictability by eliminating sort spikes entirely (no batch sort), at the cost of higher per-record overhead (O(log n) vs O(1) insert)

These approaches are complementary — continuous sorting could potentially be combined with async write in the future. This PR adds it as an opt-in alternative via write.buffer.sort.continuous.enabled=true.

I don't have formal benchmark numbers yet. Would it be helpful if I ran a comparison benchmark against the BIM approach to quantify the latency distribution differences?

prashantwason · 2026-03-10T09:40:22Z

Pushed a fix for the CI test failures:

Fixed 5 validation tests (testInvalidDrainSizeNegative, testInvalidDrainSizeZero, testInvalidBufferSizeZero, testInvalidBufferSizeNegative, testInvalidSortKeysOnlyWhitespace, testInvalidSortKeysOnlyCommas):

Moved config validation before super.open() which requires a Flink runtime context. Tests that directly instantiate the function without a test harness were getting IllegalStateException instead of IllegalArgumentException.

Fixed 2 sort order tests (testSortedResult, testSortedResultWithContinuousDrain):

Added RecordComparator for full comparison fallback when normalized keys are equal. Normalized keys are a prefix optimization and may not fully encode all sort fields, so a full RecordComparator is needed for correct multi-field sort order.

The Azure CI failure was infrastructure-related (Docker process exit code 1), not a test failure.

codecov-commenter · 2026-03-10T09:58:25Z

Codecov Report

❌ Patch coverage is 6.87023% with 122 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.79%. Comparing base (ec04479) to head (1caf9fa).
⚠️ Report is 1124 commits behind head on master.

Files with missing lines	Patch %	Lines
.../append/AppendWriteFunctionWithContinuousSort.java	0.00%	117 Missing ⚠️
.../apache/hudi/sink/append/AppendWriteFunctions.java	28.57%	1 Missing and 4 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #18083      +/-   ##
============================================
+ Coverage     61.43%   68.79%   +7.35%     
- Complexity    23082    28221    +5139     
============================================
  Files          2108     2461     +353     
  Lines        127636   135388    +7752     
  Branches      14534    16410    +1876     
============================================
+ Hits          78409    93135   +14726     
+ Misses        42873    34875    -7998     
- Partials       6354     7378    +1024

Flag	Coverage Δ
common-and-other-modules	`44.54% <6.87%> (?)`
hadoop-mr-java-client	`44.86% <ø> (?)`
spark-client-hadoop-common	`48.45% <ø> (?)`
spark-java-tests	`48.91% <ø> (?)`
spark-scala-tests	`45.51% <ø> (?)`
utilities	`38.23% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...va/org/apache/hudi/configuration/FlinkOptions.java	`99.78% <100.00%> (ø)`
...n/java/org/apache/hudi/sink/buffer/BufferType.java	`100.00% <100.00%> (ø)`
.../apache/hudi/sink/append/AppendWriteFunctions.java	`67.74% <28.57%> (ø)`
.../append/AppendWriteFunctionWithContinuousSort.java	`0.00% <0.00%> (ø)`

... and 4521 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cshuo

Thks for the detail explanation, make sense to me.

Would it be helpful if I ran a comparison benchmark against the BIM approach to quantify the latency distribution differences?

yeah, that will be helpful, a concrete result would provide more valuable reference, e.g., how much can the spikes be reduced and how much does the average throughput decrease.

Left some comments for the implementation.

cshuo · 2026-03-10T13:36:27Z

+    }
+
+    // Parse and validate sort keys
+    String sortKeys = config.get(FlinkOptions.WRITE_BUFFER_SORT_KEYS);


The existing append-sort code resolves missing write.buffer.sort.keys by falling back to the record key via AppendWriteFunctions.resolveSortKeys, can the function also follow the behavior?

Done. Now using AppendWriteFunctions.resolveSortKeys(config) which falls back to record key when write.buffer.sort.keys is not set.

cshuo · 2026-03-10T13:39:00Z

+    normalizedKeyComputer.putKey(data, reusableKeySegment, 0);
+
+    // Create sort key (copies the normalized key from reusable segment)
+    SortKey key = new SortKey(reusableKeySegment, normalizedKeySize, data, insertionSequence++);


The map stores raw RowData references and later compares them. With Flink object reuse enabled, those retained rows can be mutated after insertion, which breaks
TreeMap ordering guarantees and risks incorrect output.

Use getRuntimeContext().isObjectReuseEnabled(); to check whether object reuse is enabled, and copy rowData here if necessary.

Done. Added object reuse detection via getRuntimeContext().isObjectReuseEnabled() and RowDataSerializer.copy() in processElement() when object reuse is enabled.

cshuo · 2026-03-10T13:39:37Z

+  // Sort key computation
+  private transient NormalizedKeyComputer normalizedKeyComputer;
+  private transient RecordComparator recordComparator;
+  private transient byte[] reusableKeyBuffer;


can be local variable.

Done. Made reusableKeyBuffer a local variable in open().

cshuo · 2026-03-10T13:43:49Z

+      }
+
+      // Clear buffer and reset sequence
+      sortedRecords.clear();


Put this line inside the upper if branch?

Done. Moved sortedRecords.clear() and insertionSequence = 0L inside the if (!sortedRecords.isEmpty()) branch.

cshuo · 2026-03-10T13:44:02Z

+      }
+
+      // Reset for next checkpoint interval
+      sortedRecords.clear();


Put this line inside the upper if branch?

Done. Moved sortedRecords.clear() and insertionSequence = 0L inside the if (!sortedRecords.isEmpty()) branch.

cshuo · 2026-03-10T13:46:06Z

+/**
+ * Sink function to write data with continuous sorting for improved compression.
+ *
+ * <p>Unlike {@link AppendWriteFunctionWithBufferSort} which uses batch sorting,


Class name AppendWriteFunctionWithBufferSort is stale and has been renamed.

Done. Fixed to reference AppendWriteFunctionWithBIMBufferSort.

cshuo · 2026-03-10T13:51:43Z

+      .defaultValue(false) // default use batch sorting
+      .withDescription("Whether to use continuous sorting (TreeMap-based) instead of batch sorting. "
+          + "Continuous sorting provides O(log n) inserts and incremental draining, "
+          + "but has higher per-record overhead. Requires write.buffer.sort.enabled=true.");


write.buffer.sort.enabled is deprecated, and append writing uses WRITE_BUFFER_TYPE to select different write functions, can we also use WRITE_BUFFER_TYPE here?

Done. Removed WRITE_BUFFER_SORT_CONTINUOUS_ENABLED config entirely. Now uses WRITE_BUFFER_TYPE=CONTINUOUS_SORT via the BufferType enum. Added CONTINUOUS_SORT to BufferType and updated AppendWriteFunctions.create() to route accordingly. Updated tests as well.

…ject reuse, cleanup - Replace WRITE_BUFFER_SORT_CONTINUOUS_ENABLED with WRITE_BUFFER_TYPE=CONTINUOUS_SORT - Add CONTINUOUS_SORT to BufferType enum - Use AppendWriteFunctions.resolveSortKeys() for sort key resolution with record key fallback - Copy RowData when Flink object reuse is enabled to prevent mutation - Move sortedRecords.clear()/insertionSequence reset inside if branches - Fix stale class name reference in javadoc - Update test to use new WRITE_BUFFER_TYPE config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

prashantwason · 2026-03-18T18:25:33Z

@cshuo All review comments have been addressed:

Sort key resolution: Now uses AppendWriteFunctions.resolveSortKeys() which falls back to record key when write.buffer.sort.keys is not set
Object reuse safety: Added RowDataSerializer.copy() when Flink object reuse is enabled to prevent TreeMap ordering corruption
Local variable: Made reusableKeyBuffer a local variable in open()
endInput cleanup: Moved sortedRecords.clear() and insertionSequence = 0L inside the if branch
snapshotState cleanup: Same as above
Stale class name: Fixed javadoc reference to AppendWriteFunctionWithBIMBufferSort
Config migration: Removed WRITE_BUFFER_SORT_CONTINUOUS_ENABLED, now uses WRITE_BUFFER_TYPE=CONTINUOUS_SORT via BufferType enum. Updated AppendWriteFunctions.create() routing and tests.

Please take another look when you get a chance. Thanks!

…tions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…it operations" This reverts commit 847773a.

cshuo

Thks for updating.

cshuo · 2026-03-23T02:38:31Z

+    } catch (IOException e) {
+      throw new HoodieIOException("Failed to drain buffer during snapshot", e);
+    } finally {
+      super.snapshotState();


Let's move super.snapshotState(); out of the finally clause like other append write functions, since write meta event should not be sent to coordinator when there is ingestion exception.

cshuo · 2026-03-23T02:42:50Z

+    this.reusableKeySegment = MemorySegmentFactory.wrap(reusableKeyBuffer);
+
+    // Detect object reuse mode and create serializer for copying if needed
+    this.objectReuseEnabled = getRuntimeContext().isObjectReuseEnabled();


There is compiling error in CI, seems we need to adapt isObjectReuseEnabled() for lower flink versions, like flink 1.17, 1.18.

Besides, can you also add a test with object reuse enabled?

Done. Using RuntimeContextUtils.isObjectReuseEnabled(getRuntimeContext()) which adapts across Flink versions. For Flink 1.17-1.20, it delegates to getExecutionConfig().isObjectReuseEnabled(), and for Flink 2.0+/2.1+, it calls runtimeContext.isObjectReuseEnabled() directly.

Done. Added testObjectReuseEnabled() which uses a single reusable GenericRowData instance and verifies all 3 records are distinct after write.

cshuo · 2026-03-23T02:49:36Z

+    // Initialize TreeMap with comparator that uses normalized keys for fast comparison
+    // and falls back to RecordComparator for full comparison when normalized keys are equal
+    this.sortedRecords = new TreeMap<>((k1, k2) -> {
+      MemorySegment seg1 = MemorySegmentFactory.wrap(k1.keyBytes);


The TreeMap comparator wraps keyBytes into new MemorySegment objects for every comparison. Since comparisons happen frequently on insert/drain, this creates avoidable allocation pressure in the hot path, which increases GC preasure.

We can reuse/precompute comparable representations per SortKey, i.e., store wrapped segments in SortKey.

Done. SortKey now copies the normalized key bytes and wraps them as a MemorySegment once in its constructor, so the TreeMap comparator uses pre-computed k1.keySegment/k2.keySegment without any per-comparison allocation.

cshuo · 2026-03-23T02:53:23Z

+
+    } catch (IOException e) {
+      throw new HoodieIOException("Failed to drain buffer during endInput", e);
+    } finally {


Let's move super.endInput(); out of the finally clause like other append write functions.

Done. super.endInput() is already outside the try/catch block, consistent with the other append write functions.

…e test - Fix isObjectReuseEnabled() compilation on Flink 1.17/1.18 by using getExecutionConfig().isObjectReuseEnabled() instead of RuntimeContext method - Remove unused imports (java.util.Arrays, java.util.stream.Collectors) to fix checkstyle violations - Move super.snapshotState() and super.endInput() out of finally clauses to match other append write functions (avoid sending write meta event on error) - Optimize TreeMap comparator: store pre-wrapped MemorySegment in SortKey to avoid per-comparison allocation and reduce GC pressure - Filter empty strings after trim in resolveSortKeys() to properly validate whitespace-only and comma-only sort key configs - Add testObjectReuseEnabled test that verifies records are correctly copied when Flink object reuse is enabled (prevents TreeMap corruption) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

prashantwason · 2026-03-25T04:35:02Z

@cshuo All review comments from the latest round have been addressed in commit e592166:

super.snapshotState() / super.endInput() out of finally: Moved both out of finally clauses to match other append write functions — write meta event is no longer sent to coordinator when there is an ingestion exception.
isObjectReuseEnabled() compilation on Flink 1.17/1.18: Changed to getRuntimeContext().getExecutionConfig().isObjectReuseEnabled() for compatibility with lower Flink versions.
MemorySegment allocation in TreeMap comparator: Pre-wrapped MemorySegment is now stored in SortKey.keySegment to avoid per-comparison allocation and reduce GC pressure.
Test with object reuse enabled: Added testObjectReuseEnabled test that verifies records are correctly copied when Flink object reuse is enabled (prevents TreeMap corruption from mutated references).

Please take another look when you get a chance. Thanks!

cshuo · 2026-03-25T05:05:19Z

+    this.reusableKeySegment = MemorySegmentFactory.wrap(reusableKeyBuffer);
+
+    // Detect object reuse mode and create serializer for copying if needed
+    this.objectReuseEnabled = getRuntimeContext().getExecutionConfig().isObjectReuseEnabled();


There is some compile error for higher flink versions.

Caused by: org.apache.maven.plugin.compiler.CompilationFailureException: Compilation failure /home/runner/work/hudi/hudi/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/append/AppendWriteFunctionWithContinuousSort.java:[158,50] cannot find symbol symbol: method getExecutionConfig()

You can refer to RuntimeContextUtils to adapt object reuse check for different flink versions.

Done. Using RuntimeContextUtils.isObjectReuseEnabled() which delegates to getExecutionConfig().isObjectReuseEnabled() for Flink 1.17-1.20 and runtimeContext.isObjectReuseEnabled() directly for Flink 2.0+/2.1+.

…useEnabled getExecutionConfig() was removed from RuntimeContext in Flink 2.0+. Add isObjectReuseEnabled() to version-specific RuntimeContextUtils adapters: - Flink 1.17-1.20: uses getExecutionConfig().isObjectReuseEnabled() - Flink 2.0-2.1: uses getGlobalJobParameters() with PipelineOptions.OBJECT_REUSE Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cshuo · 2026-03-25T09:22:48Z

        AUTO_WATERMARK_INTERVAL.defaultValue().toMillis() + ""));
  }
+
+  public static boolean isObjectReuseEnabled(RuntimeContext runtimeContext) {


use runtimeContext.isObjectReuseEnabled() directly.

Done. Using runtimeContext.isObjectReuseEnabled() directly for Flink 2.0.x.

cshuo · 2026-03-25T09:23:04Z

  }
+
+  public static boolean isObjectReuseEnabled(RuntimeContext runtimeContext) {
+    Map<String, String> jobParameters = runtimeContext.getGlobalJobParameters();


use runtimeContext.isObjectReuseEnabled() directly.

Done. Using runtimeContext.isObjectReuseEnabled() directly for Flink 2.1.x.

nsivabalan · 2026-04-01T22:34:58Z

hey @cshuo: can you re-review the patch.

cshuo · 2026-04-02T01:35:51Z

hey @cshuo: can you re-review the patch.

@nsivabalan Except for the last two comments, which are not resolved yet @prashantwason , the PR LGTM overall.

yihua

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Style & Readability Review — This PR introduces a new continuous sorting buffer type for Flink append writes, which is a great feature. The implementation looks solid, with good test coverage for various scenarios, including object reuse and error handling. Overall, the code is clean, well-structured, and adheres to Hudi's coding standards. I have a few minor nits below.

yihua · 2026-04-05T05:31:42Z

+ * <p>Strategy:
+ * <ol>
+ *   <li>Records are inserted in sorted order (TreeMap)</li>
+ *   <li>When buffer reaches max capacity, oldest record(s) are drained synchronously</li>


🤖 nit: Could maxCapacity and drainSize be made final as they are configured once and not changed?

Already final. maxCapacity is private final long and drainSize is private final int, set once in the constructor.

yihua · 2026-04-05T05:31:42Z

+  private transient RecordComparator recordComparator;
+  private transient MemorySegment reusableKeySegment;
+  private transient int normalizedKeySize;
+  private transient boolean objectReuseEnabled;


🤖 nit: It might be clearer to initialize insertionSequence to 0L when sortedRecords is cleared in snapshotState() and endInput() to consistently reflect a fresh state, similar to how it's done in the constructor. Or, is it intended to continue the sequence?

Already done. insertionSequence is reset to 0L alongside sortedRecords.clear() in both snapshotState() and endInput().

yihua · 2026-04-05T05:31:42Z

+          String.format("Buffer capacity must be positive, got: %d", maxCapacity));
+    }
+
+    if (drainSize <= 0) {


🤖 nit: The check if (cmp != 0) is a common pattern, but it's redundant. recordComparator.compare and Long.compare also return 0 if equal, so you could directly return their results if cmp == 0 without an explicit check.

The if (cmp != 0) guards are intentional here - this is a 3-level comparator chain (normalized key -> full record -> insertion order). The early returns avoid the more expensive recordComparator.compare() when the cheaper normalizedKeyComputer.compareKey() already distinguishes the records. Keeping the pattern explicit for readability.

yihua · 2026-04-05T05:31:43Z

Line 97: 🤖 nit: The s -> !s.isEmpty() filter is good. Could we also add a .trim() to this stream pipeline before filtering, to handle cases like "key1, ,key2" that would result in an empty string after trimming?

Already handled. The stream pipeline does .map(String::trim).filter(s -> !s.isEmpty()), so "key1, ,key2" splits to ["key1", " ", "key2"], trims to ["key1", "", "key2"], then filters out the empty string to get ["key1", "key2"].

yihua

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

LGTM! This is a solid addition for incremental sorting that helps mitigate the latency spikes of batch sorting. I've left a couple of architectural notes for you to consider around memory footprint tracking and minimizing short-lived allocations during draining.

yihua · 2026-04-05T05:39:50Z

+      drainRecords(drainSize);
+
+      // Verify there's space after draining
+      if (sortedRecords.size() >= maxCapacity) {


🤖 Unlike BIMBufferSort, this sorting mode does not bound its memory footprint using write.task.max.size.

Fixed. Added TotalSizeTracer to bound the buffer memory footprint using write.task.max.size. Now processElement() triggers draining when either the record count limit (write.buffer.size) or memory size limit is exceeded. Record size is estimated once via ObjectSizeCalculator and tracked via TotalSizeTracer.trace()/countDown()/reset().

yihua · 2026-04-05T05:39:50Z

+    totalDrainOperations++;
+    totalDrainedRecords += drained;
+  }
+


🤖 Allocating an Iterator for every drained record can create unnecessary garbage when drainSize is small.

Fixed. Replaced the Iterator-based loop with TreeMap.pollFirstEntry() which removes and returns the first entry directly, avoiding Iterator object allocation entirely. This matters especially when drainSize=1 (the default), where each incoming record at capacity would have created a new Iterator.

…rectly for Flink 2.0+/2.1+ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add TotalSizeTracer to bound buffer memory footprint using write.task.max.size, triggering drain when memory limit is exceeded in addition to record count limit - Replace Iterator-based drain loop with TreeMap.pollFirstEntry() to avoid iterator allocation per drain call (matters when drainSize=1) - Estimate record size once via ObjectSizeCalculator and track cumulative memory via TotalSizeTracer.trace()/countDown()/reset() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

hudi-bot · 2026-04-16T02:30:14Z

CI report:

1caf9fa Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

yihua

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for addressing the feedback — both my prior findings are resolved nicely. The memory footprint is now bounded via TotalSizeTracer with write.task.max.size, and the drain loop uses pollFirstEntry() to avoid iterator allocation. The RuntimeContextUtils simplification for Flink 2.0.x/2.1.x to use runtimeContext.isObjectReuseEnabled() directly is clean. All other reviewer comments appear addressed as well. LGTM.

yihua

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

CodeRabbit Walkthrough: This PR introduces continuous sorting functionality for Hudi Flink datasource. A new TreeMap-based buffer type (CONTINUOUS_SORT) enables in-memory sorted write operations with configurable incremental draining. The implementation includes new configuration options, a complete write function with lifecycle management, comprehensive integration tests, and utility enhancements for object reuse support across Flink versions.

Sequence Diagram (CodeRabbit):

sequenceDiagram
    participant Client as Client
    participant Processor as AppendWriteFunctionWithContinuousSort
    participant TreeMap as TreeMap Buffer
    participant Writer as Underlying Writer
    participant State as State Backend

    Client->>Processor: processElement(RowData)
    Note over Processor: Compute normalized key
    Processor->>TreeMap: insert(SortKey, record)
    Note over Processor: Check buffer capacity/memory
    alt Buffer exceeds limit
        Processor->>TreeMap: pollFirstEntry() x drain_size
        TreeMap-->>Processor: oldest records
        Processor->>Writer: flush records
    end
    
    Client->>Processor: snapshotState() / endInput()
    Processor->>TreeMap: pollFirstEntry() until empty
    TreeMap-->>Processor: all buffered records
    Processor->>Writer: flush remaining
    Processor->>State: save checkpoint state
    
    Client->>Processor: close()
    Processor->>Writer: delegate to super.close()

CodeRabbit: yihua#46 (review)

yihua · 2026-04-16T18:32:50Z

+  private RowType rowType;
+
+  @TempDir
+  protected File tempFile;


⚠️ Potential issue | 🟡 Minor

Shadowed @TempDir tempFile field.

This class declares @TempDir protected File tempFile (line 61-62), but TestWriteBase already declares @TempDir protected File tempFile (inherited). This shadowing may cause confusion and could lead to unexpected behavior if both are initialized differently by JUnit.

Consider removing the local declaration since the inherited field should suffice.

🔧 Proposed fix

public class ITTestAppendWriteFunctionWithContinuousSort extends TestWriteBase { private Configuration conf; private RowType rowType; - - `@TempDir` - protected File tempFile;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/append/ITTestAppendWriteFunctionWithContinuousSort.java` around lines 61 - 62, The class ITTestAppendWriteFunctionWithContinuousSort currently redeclares the JUnit TempDir field as "protected File tempFile", shadowing the same "@TempDir protected File tempFile" declared in the superclass TestWriteBase; remove the local declaration of tempFile from ITTestAppendWriteFunctionWithContinuousSort so the test uses the inherited TestWriteBase.tempFile, and update any local references in that class to rely on the inherited field name (no other changes required).

— CodeRabbit (original) (source:comment#3095501107)

cshuo · 2026-04-17T03:13:56Z

+1
cc @nsivabalan @danny0405

github-actions Bot added the size:L PR with lines of changes in (300, 1000] label Feb 4, 2026

cshuo reviewed Mar 10, 2026

View reviewed changes

prashantwason and others added 2 commits March 19, 2026 16:48

fix(concurrency): detect rollback conflicts with ongoing commit opera…

847773a

…tions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Revert "fix(concurrency): detect rollback conflicts with ongoing comm…

d1d2d4f

…it operations" This reverts commit 847773a.

cshuo reviewed Mar 23, 2026

View reviewed changes

cshuo reviewed Mar 25, 2026

View reviewed changes

yihua reviewed Apr 5, 2026

View reviewed changes

prashantwason and others added 2 commits April 6, 2026 14:57

Address review comments: use RuntimeContext.isObjectReuseEnabled() di…

1010cab

…rectly for Flink 2.0+/2.1+ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

yihua reviewed Apr 16, 2026

View reviewed changes

yihua mentioned this pull request Apr 16, 2026

[OSS PR #18083] feat(flink): Implement continuous sorting feature for append write yihua/hudi#46

Open

yihua reviewed Apr 16, 2026

View reviewed changes

nsivabalan approved these changes Apr 20, 2026

View reviewed changes

nsivabalan merged commit 3a387da into apache:master Apr 20, 2026
55 of 56 checks passed

Conversation

prashantwason commented Feb 4, 2026

Describe the issue this Pull Request addresses

Summary and Changelog

Impact

Risk Level

Documentation Update

Contributor's checklist

Uh oh!

prashantwason commented Feb 6, 2026

Uh oh!

cshuo commented Feb 6, 2026

Uh oh!

prashantwason commented Feb 12, 2026

Uh oh!

prashantwason commented Feb 24, 2026

Uh oh!

cshuo commented Feb 24, 2026

Uh oh!

nsivabalan commented Mar 10, 2026

Uh oh!

prashantwason commented Mar 10, 2026

Uh oh!

prashantwason commented Mar 10, 2026

Uh oh!

codecov-commenter commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cshuo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prashantwason commented Mar 18, 2026

Uh oh!

cshuo left a comment

Choose a reason for hiding this comment

Uh oh!

cshuo Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Mar 10, 2026 •

edited

Loading

cshuo left a comment •

edited

Loading

cshuo Mar 23, 2026 •

edited

Loading

cshuo Mar 23, 2026 •

edited

Loading