Skip to content

perf(facade): batch V2 control-path replies in ProcessControlMessages#7479

Merged
glevkovich merged 1 commit into
mainfrom
glevkovich/glevkovich/batch_v2_control_replies
Jun 1, 2026
Merged

perf(facade): batch V2 control-path replies in ProcessControlMessages#7479
glevkovich merged 1 commit into
mainfrom
glevkovich/glevkovich/batch_v2_control_replies

Conversation

@glevkovich
Copy link
Copy Markdown
Contributor

@glevkovich glevkovich commented Jun 1, 2026

When ProcessControlMessages wakes with multiple messages queued, each std::visit call triggers FinishScope() -> Flush(), producing one sendmsg syscall per message (PubSub, Monitor, Invalidation).

Enable batch mode when dispatch_q_ holds more than one message, so all replies in the drain pass accumulate in the send buffer and flush together at the main loop's idle-await point.

Single-message wakeups are unaffected: SetBatchMode(false) leaves FinishScope() flushing immediately, preserving p=1 latency.

Cross-machine benchmark (10 subscribers, 4 threads):

Pipeline RPS Δ Syscalls Δ
1 +3% -24%
10 +12% -68%
100 -1.4% -69%
500 -2% -70%

The syscall reduction is the real gain here.

When ProcessControlMessages wakes with multiple messages queued,
each std::visit call triggers FinishScope() -> Flush(), producing
one sendmsg syscall per message (PubSub, Monitor, Invalidation).

Enable batch mode when dispatch_q_ holds more than one message,
so all replies in the drain pass accumulate in the send buffer
and flush together at the main loop's idle-await point.

Single-message wakeups are unaffected: SetBatchMode(false) leaves
FinishScope() flushing immediately, preserving p=1 latency.

Cross-machine benchmark (10 subscribers, 4 threads):

  PIPELINE  RPS Δ    SYSCALLS Δ
  1         +3%      -24%
  10        +12%     -68%
  100       -1.4%    -69%
  500       -2%      -70%

RPS is publisher-bound at p>=100; the syscall reduction is the
real gain — freeing kernel and NIC resources under fan-out load.

Signed-off-by: Gil Levkovich <69595609+glevkovich@users.noreply.github.com>
@glevkovich glevkovich requested a review from Copilot June 1, 2026 15:39
@qodo-free-for-open-source-projects
Copy link
Copy Markdown

Review Summary by Qodo

Batch V2 control-path replies in ProcessControlMessages

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Batch control-path replies when multiple messages queued
• Reduces syscalls per message via deferred flush
• Preserves single-message latency with conditional batching
• Improves throughput under fan-out load (12% RPS, -68% syscalls at p=10)

Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented Jun 1, 2026

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (0)

Grey Divider


Remediation recommended

1. Wrong batching heuristic 🐞 Bug ➹ Performance
Description
ProcessControlMessages enables batch mode based on dispatch_q_.size(), but dispatch_q_ can include
non-replying messages (e.g., CheckpointMessage, MigrationRequestMessage). This can unintentionally
batch a single reply-producing message and delay its flush, defeating the stated low-latency
behavior for single-message wakeups.
Code

src/facade/dragonfly_connection.cc[R1578-1581]

Evidence
The batching decision is based on total queue size, but checkpoints and migration markers are
enqueued into the same queue and do not produce replies, so they can inflate the size and
incorrectly trigger batching.

src/facade/dragonfly_connection.cc[1572-1582]
src/facade/dragonfly_connection.cc[1598-1604]
src/facade/dragonfly_connection.cc[647-649]
src/facade/dragonfly_connection.cc[2258-2268]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`ProcessControlMessages()` enables batching when `dispatch_q_.size() > 1`, but `dispatch_q_` can contain non-replying messages (notably `CheckpointMessage` and `MigrationRequestMessage`). This can turn batching on even when there is only a single reply-producing message, increasing latency for that reply.

### Issue Context
- Checkpoints are inserted at the front of `dispatch_q_`, so a common queue shape can be `[CheckpointMessage, <one PubSub/Monitor reply>]`.
- `MigrationRequestMessage` is explicitly treated as a no-op in `AsyncOperations`.

### Fix Focus Areas
- src/facade/dragonfly_connection.cc[1572-1607]

### Suggested fix
Change the batching predicate to consider *reply-producing* control messages only (and stop early once you find 2), e.g.:
- Scan `dispatch_q_` until 2 items that will write to `reply_builder_` are found.
- Treat `PubMessagePtr`, `MonitorMessage`, and `InvalidationMessage` as reply-producing.
- Then set `SetBatchMode(replying_count > 1)`.

If you choose to reuse `MessageHandle::IsReplying()`, ensure it includes `InvalidationMessage` (it currently does not, despite writing a reply in `AsyncOperations::operator()(const InvalidationMessage&)`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Quota path delays flush 🐞 Bug ☼ Reliability
Description
When batching is enabled and ProcessControlMessages returns early due to quota reached, control
replies written during the drain can remain buffered because FinishScope does not Flush() in batch
mode. IoLoopV2 only flushes in the idle-await block when the fiber is fully idle, so under sustained
data-path work those control replies can be delayed significantly.
Code

src/facade/dragonfly_connection.cc[R1580-1585]

Evidence
The new code enables batch mode and can return early on quota; the caller only flushes at the
idle-await point when fully idle, and explicitly does not continue to that flush point when quota is
reached.

src/facade/dragonfly_connection.cc[1572-1591]
src/facade/dragonfly_connection.cc[3024-3058]
src/facade/dragonfly_connection.cc[3070-3090]
src/facade/reply_builder.cc[224-233]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
With `SetBatchMode(true)`, per-message `ReplyScope` completion will not flush. If `ProcessControlMessages()` hits its quota and returns `true`, `IoLoopV2` deliberately falls through to the data path (to avoid starvation) and may not reach the idle-await flush point soon, delaying delivery of PubSub/Monitor/Invalidation pushes.

### Issue Context
`IoLoopV2` flushes replies in the idle-await block only when `!should_wake()`. If we keep executing/parsing (e.g., sustained pipeline traffic), the loop can continue without hitting that flush point promptly.

### Fix Focus Areas
- src/facade/dragonfly_connection.cc[1572-1607]
- src/facade/dragonfly_connection.cc[3024-3090]

### Suggested fix
When `ProcessControlMessages()` returns early due to quota reached (or when batching was enabled and we are about to fall through), explicitly flush once before returning to bound latency, e.g.:
- Track whether batch mode was enabled for this pass.
- On the quota-reached return path, call `reply_builder_->Flush()` (and handle error) before returning `true`.

This preserves the syscall reduction vs per-message flush while preventing unbounded delay when the loop does not immediately return to the idle-await flush path.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@augmentcode
Copy link
Copy Markdown

augmentcode Bot commented Jun 1, 2026

🤖 Augment PR Summary

Summary: Batches RESP V2 control-path replies in ProcessControlMessages() when multiple dispatch-queue items are pending.
Impact: Reduces per-message FinishScope()->Flush() sendmsg syscalls under PubSub/monitor/invalidation fan-out while keeping single-message wakeups unbatched to preserve p=1 latency.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestion posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread src/facade/dragonfly_connection.cc
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces per-message network flushes in the IoLoopV2 control path by batching replies produced while draining dispatch_q_ in Connection::ProcessControlMessages, aiming to cut sendmsg syscall volume under PubSub/Monitor/Invalidation fan-out.

Changes:

  • Enable SinkReplyBuilder batch mode when dispatch_q_ contains more than one queued control message.
  • Add an absl::Cleanup guard to reliably restore non-batched mode on all exit paths from ProcessControlMessages.

@glevkovich glevkovich requested a review from romange June 1, 2026 15:45
@glevkovich glevkovich enabled auto-merge (squash) June 1, 2026 15:59
@glevkovich glevkovich merged commit 78d3fcf into main Jun 1, 2026
14 checks passed
@glevkovich glevkovich deleted the glevkovich/glevkovich/batch_v2_control_replies branch June 1, 2026 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants