-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Type-Erased Dispatch Silently Drops Coroutines, Causing Deadlock
Executive Summary
There is a fundamental API incompatibility between executor_ref::dispatch() and strand::dispatch() that causes deadlocks when using strand with when_all or any code path that dispatches through a type-erased executor_ref while running inside the strand's execution context.
The bug: executor_ref::dispatch() returns void and discards the return value from the underlying executor. When that executor is a strand, the discarded return value is a coroutine handle that was supposed to be resumed via symmetric transfer. The coroutine is never resumed, causing a deadlock.
Impact: Any code running on a strand that uses when_all, or dispatches work through executor_ref, will deadlock.
Recommended fix: Change executor_ref::dispatch() to return coro instead of void, preserving the symmetric transfer return value.
Reproduction
#include <boost/capy.hpp>
#include <boost/capy/ex/strand.hpp>
#include <iostream>
#include <latch>
using namespace boost::capy;
int main()
{
thread_pool pool;
strand s{pool.get_executor()};
std::latch done(1);
auto on_complete = [&done](auto&&...) { done.count_down(); };
auto on_error = [&done](std::exception_ptr) { done.count_down(); };
auto task_a = []() -> task<> {
std::cout << "Task A running!\n";
co_return;
};
auto task_b = []() -> task<> {
std::cout << "Task B running!\n";
co_return;
};
auto run_both = [&]() -> task<> {
std::cout << "Before when_all\n";
co_await when_all(task_a(), task_b()); // HANGS HERE
std::cout << "After when_all\n";
};
run_async(s, on_complete, on_error)(run_both());
done.wait(); // Never completes
return 0;
}Output:
Before when_all
(Program hangs indefinitely)
Note: A simple task without when_all works correctly:
run_async(s, on_complete, on_error)(task_a()); // Works fineBackground
What is Capy?
Boost.Capy is a C++20 coroutine library providing:
task<T>: A lazy coroutine type (doesn't start until awaited)- Executors: Objects that schedule and run coroutines (
thread_pool,strand,io_context) - Concurrency primitives:
when_allfor parallel execution,async_eventfor signaling - Type-erased wrappers:
executor_refandany_executorfor runtime polymorphism
Boost.Corosio is a companion library providing I/O primitives (sockets, timers) that integrate with Capy.
Key Concepts
Executors
An executor is an object that can schedule coroutines for execution. In Capy, executors provide two key methods:
void post(coro h); // Queue coroutine for later execution
coro dispatch(coro h); // Execute now if possible, else queueThe dispatch() method is an optimization: if the caller is already running on this executor's thread, it can resume the coroutine immediately instead of queuing it.
Type Erasure with executor_ref
executor_ref is a lightweight, non-owning wrapper that can hold any executor type. It uses a vtable (virtual function table) for runtime polymorphism without inheritance:
// Can wrap any executor type
void schedule_work(executor_ref ex) {
ex.dispatch(some_coroutine); // Works with any executor
}
thread_pool pool;
strand s{pool.get_executor()};
schedule_work(pool.get_executor()); // Works
schedule_work(s); // Works (but has the bug!)Symmetric Transfer
Symmetric transfer is a C++20 coroutine optimization that avoids stack growth when switching between coroutines. Instead of one coroutine calling another (which adds a stack frame), coroutines "transfer" control directly via std::coroutine_handle.
// WITHOUT symmetric transfer (stack grows):
coro await_suspend(coro h) {
next_coroutine.resume(); // Adds stack frame
return std::noop_coroutine();
}
// WITH symmetric transfer (stack stays flat):
coro await_suspend(coro h) {
return next_coroutine; // Caller resumes this handle directly
}The returned handle tells the coroutine machinery which coroutine to resume next. Returning std::noop_coroutine() means "I've handled it, don't resume anything."
Strand
A strand serializes execution: coroutines dispatched through a strand never run concurrently, even on a multi-threaded executor. This is useful for protecting shared state without explicit locking.
thread_pool pool;
strand s{pool.get_executor()};
// These will never run simultaneously, even though pool has multiple threads
run_async(s)(task_a());
run_async(s)(task_b());Root Cause Analysis
The Two Dispatch APIs
The bug stems from a mismatch between how strand and executor_ref define their dispatch() methods.
strand::dispatch() — Returns coro for Symmetric Transfer
// strand.hpp
coro dispatch(coro h) const
{
return detail::strand_service::dispatch(*impl_, executor_ref(ex_), h);
}
// strand_service.cpp
coro strand_service::dispatch(strand_impl& impl, executor_ref ex, coro h)
{
// Optimization: if we're already running in this strand,
// return the handle for immediate symmetric transfer
if (running_in_this_thread(impl))
return h; // ← Caller is expected to resume this!
// Otherwise, queue the coroutine and start the invoker
if (strand_service_impl::enqueue(impl, h))
ex.post(strand_service_impl::make_invoker(impl).h_);
return std::noop_coroutine(); // Caller does nothing
}When dispatch() is called from within the strand (i.e., running_in_this_thread() is true), it returns the coroutine handle h directly. The caller is expected to resume this handle via symmetric transfer.
executor_ref::dispatch() — Returns void, Ignores Return Value
// executor_ref.hpp
void dispatch(coro h) const
{
vt_->dispatch(ex_, h); // Calls strand::dispatch(), IGNORES return value!
}
// The vtable entry for dispatch:
static constexpr executor_vtable vtable_for = {
// ...
// dispatch lambda - note it returns void
[](void const* p, std::coroutine_handle<> h) {
static_cast<Ex const*>(p)->dispatch(h); // Return value discarded!
},
// ...
};The type-erased executor_ref calls the underlying executor's dispatch() but discards the return value. When wrapping a strand, this means the handle returned for symmetric transfer is lost.
Why thread_pool Works
thread_pool::executor_type::dispatch() always queues work and returns noop_coroutine():
// thread_pool.hpp
coro dispatch(coro h) const
{
post(h); // Always queue, never inline
return std::noop_coroutine(); // "I handled it, nothing for caller to do"
}Since it always returns noop_coroutine(), ignoring the return value is harmless.
Why strand Fails
When strand::dispatch() is called from within the strand's invoker thread:
running_in_this_thread()returnstrue(we're inside the strand)strand::dispatch()returnshdirectly (expecting the caller to resume it)executor_ref::dispatch()ignores this return value- The coroutine handle
his never resumed - Deadlock: the coroutine waits forever
Detailed Execution Trace
1. run_async(strand, ...) is called with run_both() task
2. strand::dispatch() is called from main thread (NOT in strand)
└─ running_in_this_thread() == false
└─ Coroutine is enqueued
└─ Strand invoker is posted to thread_pool
└─ Returns noop_coroutine() ✓
3. Thread pool worker picks up strand invoker
4. Invoker sets dispatch_thread_ = current thread ID
5. Invoker dispatches pending coroutines (including run_both)
6. run_both() starts executing
7. run_both() calls: co_await when_all(task_a(), task_b())
8. when_all creates runner coroutines for task_a and task_b
9. when_all calls: executor_ref::dispatch(runner_0)
└─ executor_ref wraps the strand
└─ Calls strand::dispatch(runner_0)
└─ running_in_this_thread() == TRUE (we're in the invoker!)
└─ strand::dispatch() returns runner_0 handle
└─ executor_ref::dispatch() IGNORES this return value ✗
└─ runner_0 is NEVER resumed!
10. Same happens for runner_1
11. Neither runner executes
└─ when_all's completion counter never reaches zero
└─ when_all waits forever
└─ DEADLOCK
Affected Code Paths
Any code that:
- Runs on a strand, AND
- Dispatches work through
executor_refwhile inside that strand's context
This includes:
when_alllaunching child tasks (usesexecutor_ref::dispatch)io_awaitable_support::complete()dispatching continuations- Any user code calling
executor_ref::dispatch()from within a strand
Potential Solutions
Option 1: Change executor_ref::dispatch to Return coro (Recommended)
Change:
// executor_ref.hpp - BEFORE
void dispatch(coro h) const
{
vt_->dispatch(ex_, h);
}
// executor_ref.hpp - AFTER
coro dispatch(coro h) const
{
return vt_->dispatch(ex_, h);
}
// vtable - BEFORE
void (*dispatch)(void const*, std::coroutine_handle<>);
// vtable - AFTER
coro (*dispatch)(void const*, std::coroutine_handle<>);
// vtable lambda - AFTER
[](void const* p, std::coroutine_handle<> h) -> coro {
return static_cast<Ex const*>(p)->dispatch(h);
},Analysis:
- Correct semantic: dispatch can return a handle for symmetric transfer
- Callers of
executor_ref::dispatch()must handle the return value - Aligns with how concrete executor types (
strand,thread_pool) already work - Preserves symmetric transfer optimization
Option 2: Change strand::dispatch to Never Rely on Symmetric Transfer
Change strand to always enqueue, even when in-thread:
coro strand_service::dispatch(strand_impl& impl, executor_ref ex, coro h)
{
// Remove the running_in_this_thread optimization entirely
if (strand_service_impl::enqueue(impl, h))
ex.post(strand_service_impl::make_invoker(impl).h_);
return std::noop_coroutine();
}Analysis:
- Simple fix
- Performance regression: loses inline execution when already in strand
- Every dispatch from within a strand now goes through the queue
- Defeats the purpose of the
running_in_this_threadoptimization
Option 3: Strand Resumes Inline Without Symmetric Transfer
Change strand to call resume() directly instead of returning the handle:
coro strand_service::dispatch(strand_impl& impl, executor_ref ex, coro h)
{
if (running_in_this_thread(impl))
{
h.resume(); // Resume immediately, don't use symmetric transfer
return std::noop_coroutine();
}
if (strand_service_impl::enqueue(impl, h))
ex.post(strand_service_impl::make_invoker(impl).h_);
return std::noop_coroutine();
}Analysis:
- Preserves inline execution optimization
- Stack depth increases: each nested dispatch adds a stack frame
- Risk of stack overflow with deeply nested coroutine chains
- This is how
io_contextcurrently works (see table below)
Option 4: Vtable Dispatch Resumes Returned Handle Internally
Change vtable to handle symmetric transfer transparently:
// vtable dispatch wrapper
[](void const* p, std::coroutine_handle<> h) {
auto result = static_cast<Ex const*>(p)->dispatch(h);
if (result && result != std::noop_coroutine())
result.resume(); // Transparently handle symmetric transfer
},Analysis:
- No change to
executor_refpublic API - Hidden behavior makes debugging harder
- Performance overhead: checks return value on every dispatch
- Stack depth issues (same as Option 3)
Recommendation
Option 1 (change executor_ref::dispatch to return coro) is the most correct solution. The current void return type is fundamentally incompatible with executors that support symmetric transfer.
This aligns executor_ref with how concrete executor types already define their dispatch() methods—both strand and thread_pool return coro.
Option 3 (resume inline) could be considered if there's a strong reason to keep executor_ref::dispatch() returning void, but it sacrifices symmetric transfer's stack efficiency.
Executor Types in Codebase
| Executor | dispatch() Returns | In-Thread Behavior | Works with executor_ref? |
|---|---|---|---|
thread_pool::executor_type |
coro |
Returns noop_coroutine() (always queues) |
Yes |
strand<Ex> |
coro |
Returns h for symmetric transfer |
No (BUG) |
basic_io_context::executor_type |
void |
Calls h.resume() directly |
Yes |
test::run_blocking::executor_type |
void |
Calls h.resume() directly |
Yes |
mock_executor (test helper) |
void |
Calls h.resume() directly |
Yes |
executor_ref |
void |
Calls wrapped dispatch, ignores return | N/A (is the wrapper) |
any_executor |
void |
Calls wrapped dispatch, ignores return | N/A (is a wrapper) |
Observations
-
Inconsistent return types: Some executors return
coro(for symmetric transfer), others returnvoid(handle inline execution internally by callingresume()). -
strandis unique: It's the only executor that returns a non-noop handle fromdispatch()for symmetric transfer optimization. -
io_contextavoids the issue:basic_io_context::executor_type::dispatch()returnsvoidand handles inline execution internally viah.resume(). This works withexecutor_refbut loses symmetric transfer benefits. -
any_executorhas the same bug: Likeexecutor_ref, it also uses a vtable that ignores the return value.
Design Question
Should all executors in Capy:
- (A) Return
corofromdispatch()to support symmetric transfer? (Requires fixingexecutor_refandany_executor) - (B) Return
voidand handle inline execution internally viah.resume()? (Requires changingstrandandthread_pool)
Option A preserves symmetric transfer's stack efficiency. Option B is simpler but loses that optimization.
Related Files
| File | Description |
|---|---|
include/boost/capy/ex/executor_ref.hpp |
Type-erased non-owning executor wrapper (has the bug) |
include/boost/capy/ex/any_executor.hpp |
Type-erased owning executor wrapper (has the same bug) |
include/boost/capy/ex/strand.hpp |
Strand executor adaptor |
src/ex/detail/strand_service.cpp |
Strand dispatch implementation |
include/boost/capy/ex/thread_pool.hpp |
Thread pool executor |
include/boost/capy/when_all.hpp |
Uses executor_ref::dispatch for child tasks |
include/boost/capy/ex/io_awaitable_support.hpp |
Uses executor_ref::dispatch in complete() |
include/boost/corosio/basic_io_context.hpp |
I/O context executor (returns void, calls resume internally) |
Test Case
After fixing, this should work:
#include <boost/capy.hpp>
#include <boost/capy/ex/strand.hpp>
#include <latch>
using namespace boost::capy;
int main()
{
thread_pool pool;
strand s{pool.get_executor()};
auto outer = [&]() -> task<> {
co_await when_all(
[]() -> task<> { co_return; }(),
[]() -> task<> { co_return; }()
);
};
std::latch done(1);
run_async(s,
[&](auto&&...) { done.count_down(); }, // on_complete
[&](auto) { done.count_down(); } // on_error
)(outer());
done.wait(); // Should complete, not hang
return 0;
}Glossary
| Term | Definition |
|---|---|
| coro | Alias for std::coroutine_handle<> — a type-erased handle to any coroutine |
| Symmetric transfer | C++20 optimization where await_suspend returns a coroutine handle for the runtime to resume, avoiding stack growth |
| noop_coroutine | A special coroutine handle that does nothing when resumed; returned to indicate "no transfer needed" |
| Strand | An executor wrapper that serializes execution — work dispatched through it never runs concurrently |
| Type erasure | A technique for runtime polymorphism without inheritance, typically using function pointers or vtables |
| vtable | Virtual function table — a struct of function pointers used for type erasure |
| executor_ref | A non-owning type-erased wrapper for any Capy executor |
| when_all | A primitive that runs multiple tasks concurrently and waits for all to complete |