fix(server): Exception based command cancelling#7477
Conversation
Review Summary by QodoImplement exception-based command cancellation mechanism
WalkthroughsDescription• Implement exception-based command cancellation mechanism with CancellationException • Add unhandled_exception() handler in coroutine to catch and process cancellation • Enhance socket error handling to cancel scheduled transactions via dispatcher • Set OpStatus::CANCELLED when transaction is successfully cancelled • Catch CancellationException in command invocation to send error reply |
Code Review by Qodo
1.
|
🤖 Augment PR SummarySummary: This PR introduces exception-based command cancellation to better abort in-flight work when a transaction is cancelled (e.g., due to disconnects). Changes:
Technical Notes: Cancellation can now propagate via exceptions for synchronous hops, while async hops surface cancellation via 🤖 Was this summary useful? React with 👍 or 👎 |
| throw; | ||
| } catch (const facade::CancellationException&) { | ||
| cmd_cntx->SendError("Cancelled"); | ||
| } catch (const std::exception& e) { |
There was a problem hiding this comment.
src/server/cmd_support.cc:48-50 In CmdR::Coro::unhandled_exception, the non-cancellation path only logs and doesn't send any reply, which can leave the client waiting forever (and likely trips ReplyGuard in debug). Consider ensuring an error reply is always sent when an exception escapes a command coroutine.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
| auto is_active = [this](uint32_t i) { return IsActive(i); }; | ||
| shard_set->RunBriefInParallel([this](EngineShard* shard) { CancelShardCb(shard); }, is_active); | ||
| coordinator_state_ = (coordinator_state_ & ~COORD_SCHED) | COORD_CANCELLED; | ||
| local_result_ = OpStatus::CANCELLED; |
There was a problem hiding this comment.
src/server/transaction.cc:1082 Setting local_result_ = OpStatus::CANCELLED means async SingleHopAsync/SingleHopWaiter::await_resume() will surface cancellation as a status rather than the new CancellationException, so existing coroutine code that assumes OpStatus::OK (e.g. CHECK_EQ(OpStatus::OK, result)) may now abort on disconnect cancellations. Consider making cancellation propagation consistent across sync/async hops so callers don't accidentally crash or continue after a cancelled hop.
Severity: medium
Other Locations
src/server/string_family.cc:1460
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
| if (tx_alive->IsScheduled() && !tx_alive->Blocker()->IsCompleted()) { | ||
| tx_alive->CancelScheduledTx(); | ||
| } |
There was a problem hiding this comment.
I also would probably verify that this is the first hop, but in CancelScheduledTx, so it doesn't assume that the guarantees are true (I do now)
With the V1 loop, we don't have an as nice mechanism as with V2 and we can only "kick" out the stack by cancelling the transaction and letting it throw a runtime exception. InvokeCmd catches it, it is also handed in asynchronous commands running in synchronous mode
Works quite well in tests and manual tests, need to check corner cases