perf(facade): batch V2 control-path replies in ProcessControlMessages by glevkovich · Pull Request #7479 · dragonflydb/dragonfly

glevkovich · 2026-06-01T15:39:55Z

When ProcessControlMessages wakes with multiple messages queued, each std::visit call triggers FinishScope() -> Flush(), producing one sendmsg syscall per message (PubSub, Monitor, Invalidation).

Enable batch mode when dispatch_q_ holds more than one message, so all replies in the drain pass accumulate in the send buffer and flush together at the main loop's idle-await point.

Single-message wakeups are unaffected: SetBatchMode(false) leaves FinishScope() flushing immediately, preserving p=1 latency.

Cross-machine benchmark (10 subscribers, 4 threads):

Pipeline	RPS Δ	Syscalls Δ
1	+3%	-24%
10	+12%	-68%
100	-1.4%	-69%
500	-2%	-70%

The syscall reduction is the real gain here.

When ProcessControlMessages wakes with multiple messages queued, each std::visit call triggers FinishScope() -> Flush(), producing one sendmsg syscall per message (PubSub, Monitor, Invalidation). Enable batch mode when dispatch_q_ holds more than one message, so all replies in the drain pass accumulate in the send buffer and flush together at the main loop's idle-await point. Single-message wakeups are unaffected: SetBatchMode(false) leaves FinishScope() flushing immediately, preserving p=1 latency. Cross-machine benchmark (10 subscribers, 4 threads): PIPELINE RPS Δ SYSCALLS Δ 1 +3% -24% 10 +12% -68% 100 -1.4% -69% 500 -2% -70% RPS is publisher-bound at p>=100; the syscall reduction is the real gain — freeing kernel and NIC resources under fan-out load. Signed-off-by: Gil Levkovich <69595609+glevkovich@users.noreply.github.com>

qodo-free-for-open-source-projects · 2026-06-01T15:40:07Z

Review Summary by Qodo

Batch V2 control-path replies in ProcessControlMessages

✨ Enhancement

Walkthroughs

Description

• Batch control-path replies when multiple messages queued
• Reduces syscalls per message via deferred flush
• Preserves single-message latency with conditional batching
• Improves throughput under fan-out load (12% RPS, -68% syscalls at p=10)

qodo-free-for-open-source-projects · 2026-06-01T15:40:08Z

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (0)

1. Wrong batching heuristic 🐞 Bug ➹ Performance

Description

ProcessControlMessages enables batch mode based on dispatch_q_.size(), but dispatch_q_ can include
non-replying messages (e.g., CheckpointMessage, MigrationRequestMessage). This can unintentionally
batch a single reply-producing message and delay its flush, defeating the stated low-latency
behavior for single-message wakeups.

Code

src/facade/dragonfly_connection.cc[R1578-1581]

Evidence
The batching decision is based on total queue size, but checkpoints and migration markers are
enqueued into the same queue and do not produce replies, so they can inflate the size and
incorrectly trigger batching.
src/facade/dragonfly_connection.cc[1572-1582]
src/facade/dragonfly_connection.cc[1598-1604]
src/facade/dragonfly_connection.cc[647-649]
src/facade/dragonfly_connection.cc[2258-2268]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`ProcessControlMessages()` enables batching when `dispatch_q_.size() > 1`, but `dispatch_q_` can contain non-replying messages (notably `CheckpointMessage` and `MigrationRequestMessage`). This can turn batching on even when there is only a single reply-producing message, increasing latency for that reply.

### Issue Context
- Checkpoints are inserted at the front of `dispatch_q_`, so a common queue shape can be `[CheckpointMessage, <one PubSub/Monitor reply>]`.
- `MigrationRequestMessage` is explicitly treated as a no-op in `AsyncOperations`.

### Fix Focus Areas
- src/facade/dragonfly_connection.cc[1572-1607]

### Suggested fix
Change the batching predicate to consider *reply-producing* control messages only (and stop early once you find 2), e.g.:
- Scan `dispatch_q_` until 2 items that will write to `reply_builder_` are found.
- Treat `PubMessagePtr`, `MonitorMessage`, and `InvalidationMessage` as reply-producing.
- Then set `SetBatchMode(replying_count > 1)`.

If you choose to reuse `MessageHandle::IsReplying()`, ensure it includes `InvalidationMessage` (it currently does not, despite writing a reply in `AsyncOperations::operator()(const InvalidationMessage&)`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. Quota path delays flush 🐞 Bug ☼ Reliability

Description

When batching is enabled and ProcessControlMessages returns early due to quota reached, control
replies written during the drain can remain buffered because FinishScope does not Flush() in batch
mode. IoLoopV2 only flushes in the idle-await block when the fiber is fully idle, so under sustained
data-path work those control replies can be delayed significantly.

Code

src/facade/dragonfly_connection.cc[R1580-1585]

Evidence
The new code enables batch mode and can return early on quota; the caller only flushes at the
idle-await point when fully idle, and explicitly does not continue to that flush point when quota is
reached.
src/facade/dragonfly_connection.cc[1572-1591]
src/facade/dragonfly_connection.cc[3024-3058]
src/facade/dragonfly_connection.cc[3070-3090]
src/facade/reply_builder.cc[224-233]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
With `SetBatchMode(true)`, per-message `ReplyScope` completion will not flush. If `ProcessControlMessages()` hits its quota and returns `true`, `IoLoopV2` deliberately falls through to the data path (to avoid starvation) and may not reach the idle-await flush point soon, delaying delivery of PubSub/Monitor/Invalidation pushes.

### Issue Context
`IoLoopV2` flushes replies in the idle-await block only when `!should_wake()`. If we keep executing/parsing (e.g., sustained pipeline traffic), the loop can continue without hitting that flush point promptly.

### Fix Focus Areas
- src/facade/dragonfly_connection.cc[1572-1607]
- src/facade/dragonfly_connection.cc[3024-3090]

### Suggested fix
When `ProcessControlMessages()` returns early due to quota reached (or when batching was enabled and we are about to fall through), explicitly flush once before returning to bound latency, e.g.:
- Track whether batch mode was enabled for this pass.
- On the quota-reached return path, call `reply_builder_->Flush()` (and handle error) before returning `true`.

This preserves the syscall reduction vs per-message flush while preventing unbounded delay when the loop does not immediately return to the idle-await flush path.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

augmentcode · 2026-06-01T15:43:31Z

🤖 Augment PR Summary

Summary: Batches RESP V2 control-path replies in ProcessControlMessages() when multiple dispatch-queue items are pending.
Impact: Reduces per-message FinishScope()->Flush() sendmsg syscalls under PubSub/monitor/invalidation fan-out while keeping single-message wakeups unbatched to preserve p=1 latency.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 1 suggestion posted.

Comment augment review to trigger a new review at any time.

Copilot

Pull request overview

This PR reduces per-message network flushes in the IoLoopV2 control path by batching replies produced while draining dispatch_q_ in Connection::ProcessControlMessages, aiming to cut sendmsg syscall volume under PubSub/Monitor/Invalidation fan-out.

Changes:

Enable SinkReplyBuilder batch mode when dispatch_q_ contains more than one queued control message.
Add an absl::Cleanup guard to reliably restore non-batched mode on all exit paths from ProcessControlMessages.

glevkovich requested a review from Copilot June 1, 2026 15:39

Copilot started reviewing on behalf of glevkovich June 1, 2026 15:40 View session

glevkovich requested a review from dranikpg June 1, 2026 15:43

augmentcode Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread src/facade/dragonfly_connection.cc

Copilot AI reviewed Jun 1, 2026

View reviewed changes

glevkovich requested a review from romange June 1, 2026 15:45

dranikpg approved these changes Jun 1, 2026

View reviewed changes

glevkovich enabled auto-merge (squash) June 1, 2026 15:59

glevkovich merged commit 78d3fcf into main Jun 1, 2026
14 checks passed

glevkovich deleted the glevkovich/glevkovich/batch_v2_control_replies branch June 1, 2026 16:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(facade): batch V2 control-path replies in ProcessControlMessages#7479

perf(facade): batch V2 control-path replies in ProcessControlMessages#7479
glevkovich merged 1 commit into
mainfrom
glevkovich/glevkovich/batch_v2_control_replies

glevkovich commented Jun 1, 2026 •

edited

Loading

Uh oh!

qodo-free-for-open-source-projects Bot commented Jun 1, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented Jun 1, 2026 •

edited

Loading

Uh oh!

augmentcode Bot commented Jun 1, 2026

Uh oh!

augmentcode Bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

glevkovich commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qodo-free-for-open-source-projects Bot commented Jun 1, 2026

Review Summary by Qodo

Walkthroughs

Uh oh!

qodo-free-for-open-source-projects Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

augmentcode Bot commented Jun 1, 2026

Uh oh!

augmentcode Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

glevkovich commented Jun 1, 2026 •

edited

Loading

qodo-free-for-open-source-projects Bot commented Jun 1, 2026 •

edited

Loading