Skip to content

fix(server): Exception based command cancelling#7477

Draft
dranikpg wants to merge 4 commits into
dragonflydb:mainfrom
dranikpg:exception-cancel
Draft

fix(server): Exception based command cancelling#7477
dranikpg wants to merge 4 commits into
dragonflydb:mainfrom
dranikpg:exception-cancel

Conversation

@dranikpg
Copy link
Copy Markdown
Contributor

@dranikpg dranikpg commented Jun 1, 2026

With the V1 loop, we don't have an as nice mechanism as with V2 and we can only "kick" out the stack by cancelling the transaction and letting it throw a runtime exception. InvokeCmd catches it, it is also handed in asynchronous commands running in synchronous mode

Works quite well in tests and manual tests, need to check corner cases

@dranikpg dranikpg marked this pull request as draft June 1, 2026 11:28
@qodo-free-for-open-source-projects
Copy link
Copy Markdown

Review Summary by Qodo

Implement exception-based command cancellation mechanism

🐞 Bug fix ✨ Enhancement

Grey Divider

Walkthroughs

Description
• Implement exception-based command cancellation mechanism with CancellationException
• Add unhandled_exception() handler in coroutine to catch and process cancellation
• Enhance socket error handling to cancel scheduled transactions via dispatcher
• Set OpStatus::CANCELLED when transaction is successfully cancelled
• Catch CancellationException in command invocation to send error reply

Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented Jun 1, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Double reply on cancel ✓ Resolved 🐞 Bug ≡ Correctness
Description
Service::InvokeCmd sends a "Cancelled" error but returns DispatchResult::ERROR, which triggers the
caller to send a second "Internal Error" reply and mark the connection for close. This can corrupt
the protocol stream (two replies for one command) and obscures the actual cancellation reason.
Code

src/server/main_service.cc[R1609-1611]

Evidence
InvokeCmd explicitly sends "Cancelled" and returns ERROR, while the outer dispatch code sends
"Internal Error" and closes the connection for any non-OK/OOM result—so cancellation will produce
two error replies.

src/server/main_service.cc[1606-1615]
src/server/main_service.cc[1524-1534]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`Service::InvokeCmd` catches `facade::CancellationException`, sends an error reply, and then returns `DispatchResult::ERROR`. The outer dispatch logic treats any non-OK/OOM result as an internal failure and sends an additional "Internal Error" reply + closes the connection, causing double replies.
### Issue Context
- `InvokeCmd` already sends the cancellation reply.
- The outer layer (the function calling `InvokeCmd`) unconditionally sends a generic internal error for `DispatchResult::ERROR`.
### Fix Focus Areas
- src/server/main_service.cc[1606-1615]
- src/server/main_service.cc[1524-1534]
### Suggested fix
- Option A (minimal): In the `CancellationException` catch, **do not return `DispatchResult::ERROR`**. Return `DispatchResult::OK` (or introduce a dedicated `DispatchResult::CANCELLED` that the outer layer treats like OK) since the reply has already been sent.
- If you still want to ensure the connection closes, do it explicitly in the catch (e.g., mark the connection for close) rather than relying on the outer generic error path.
- Also change `catch (const facade::CancellationException& ce)` to `catch (const facade::CancellationException&)` to avoid an unused catch parameter warning in stricter builds.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Async exception no reply ✓ Resolved 🐞 Bug ☼ Reliability
Description
CmdR::Coro::unhandled_exception logs std::exception but does not send any error reply, so a deferred
async command can resume, throw, and leave the client without a response. Because
ParsedCommand::SendReply clears the coroutine handle after resuming, there is no later fallback path
to emit an error.
Code

src/server/cmd_support.cc[R43-50]

Evidence
The coroutine promise's unhandled_exception logs but does not send a reply for std::exception,
while ParsedCommand::SendReply resumes and then clears the coroutine handle—so there's no later
mechanism to generate a reply.

src/server/cmd_support.cc[43-51]
src/facade/parsed_command.cc[109-118]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`CmdR::Coro::unhandled_exception()` now handles cancellation by replying "Cancelled", but for other exceptions it only logs and does not send any reply. For deferred replies, this can result in the client waiting indefinitely/timeing out because no payload was produced.
> Note: the previous implementation was an empty `unhandled_exception()`; this PR improves observability but still leaves the missing-reply behavior for non-cancellation exceptions.
### Issue Context
- Deferred async commands store a coroutine handle in `ParsedCommand::Resolve(...)`.
- When the blocker completes, `ParsedCommand::SendReply()` resumes the coroutine and then clears the handle.
- If the resumed coroutine throws and `unhandled_exception()` doesn't send an error, nothing else will.
### Fix Focus Areas
- src/server/cmd_support.cc[43-51]
- src/facade/parsed_command.cc[109-118]
### Suggested fix
- In `CmdR::Coro::unhandled_exception()`, for the `std::exception` branch, also send an error reply (e.g. `cmd_cntx->SendError("Internal Error")`) so the client gets a response.
- Optionally, after sending the error, consider marking the connection for close to avoid continuing after an unexpected internal exception (if feasible via the available context).
- Keep the logging, but ensure it is paired with a reply for deferred commands.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@augmentcode
Copy link
Copy Markdown

augmentcode Bot commented Jun 1, 2026

🤖 Augment PR Summary

Summary: This PR introduces exception-based command cancellation to better abort in-flight work when a transaction is cancelled (e.g., due to disconnects).

Changes:

  • Adds facade::CancellationException as a dedicated cancellation signal
  • Throws CancellationException from Transaction::ScheduleSingleHop when coordinator cancellation is detected
  • Handles cancellation in the command coroutine promise (CmdR::Coro::unhandled_exception) and in Service::InvokeCmd to return a "Cancelled" error
  • Extends socket-error handling to attempt cancelling already-scheduled transactions via a dispatched callback

Technical Notes: Cancellation can now propagate via exceptions for synchronous hops, while async hops surface cancellation via OpStatus updates.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 3 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread src/server/conn_context.cc Outdated
Comment thread src/server/cmd_support.cc
throw;
} catch (const facade::CancellationException&) {
cmd_cntx->SendError("Cancelled");
} catch (const std::exception& e) {
Copy link
Copy Markdown

@augmentcode augmentcode Bot Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/server/cmd_support.cc:48-50 In CmdR::Coro::unhandled_exception, the non-cancellation path only logs and doesn't send any reply, which can leave the client waiting forever (and likely trips ReplyGuard in debug). Consider ensuring an error reply is always sent when an exception escapes a command coroutine.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Comment thread src/server/transaction.cc
auto is_active = [this](uint32_t i) { return IsActive(i); };
shard_set->RunBriefInParallel([this](EngineShard* shard) { CancelShardCb(shard); }, is_active);
coordinator_state_ = (coordinator_state_ & ~COORD_SCHED) | COORD_CANCELLED;
local_result_ = OpStatus::CANCELLED;
Copy link
Copy Markdown

@augmentcode augmentcode Bot Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/server/transaction.cc:1082 Setting local_result_ = OpStatus::CANCELLED means async SingleHopAsync/SingleHopWaiter::await_resume() will surface cancellation as a status rather than the new CancellationException, so existing coroutine code that assumes OpStatus::OK (e.g. CHECK_EQ(OpStatus::OK, result)) may now abort on disconnect cancellations. Consider making cancellation propagation consistent across sync/async hops so callers don't accidentally crash or continue after a cancelled hop.

Severity: medium

Other Locations
  • src/server/string_family.cc:1460

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Comment thread src/server/main_service.cc Outdated
dranikpg added 2 commits June 1, 2026 17:47
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
@dranikpg dranikpg changed the title fix(server): exception based command cancelling fix(server): Exception based command cancelling Jun 1, 2026
Comment on lines +221 to +223
if (tx_alive->IsScheduled() && !tx_alive->Blocker()->IsCompleted()) {
tx_alive->CancelScheduledTx();
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also would probably verify that this is the first hop, but in CancelScheduledTx, so it doesn't assume that the guarantees are true (I do now)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant