Skip to content

Attempting to use a capy strand with when_all causes processes to hang. #131

@MungoG

Description

@MungoG

Type-Erased Dispatch Silently Drops Coroutines, Causing Deadlock

Executive Summary

There is a fundamental API incompatibility between executor_ref::dispatch() and strand::dispatch() that causes deadlocks when using strand with when_all or any code path that dispatches through a type-erased executor_ref while running inside the strand's execution context.

The bug: executor_ref::dispatch() returns void and discards the return value from the underlying executor. When that executor is a strand, the discarded return value is a coroutine handle that was supposed to be resumed via symmetric transfer. The coroutine is never resumed, causing a deadlock.

Impact: Any code running on a strand that uses when_all, or dispatches work through executor_ref, will deadlock.

Recommended fix: Change executor_ref::dispatch() to return coro instead of void, preserving the symmetric transfer return value.


Reproduction

#include <boost/capy.hpp>
#include <boost/capy/ex/strand.hpp>
#include <iostream>
#include <latch>

using namespace boost::capy;

int main()
{
    thread_pool pool;
    strand s{pool.get_executor()};
    std::latch done(1);
    
    auto on_complete = [&done](auto&&...) { done.count_down(); };
    auto on_error = [&done](std::exception_ptr) { done.count_down(); };
    
    auto task_a = []() -> task<> {
        std::cout << "Task A running!\n";
        co_return;
    };
    
    auto task_b = []() -> task<> {
        std::cout << "Task B running!\n";
        co_return;
    };
    
    auto run_both = [&]() -> task<> {
        std::cout << "Before when_all\n";
        co_await when_all(task_a(), task_b());  // HANGS HERE
        std::cout << "After when_all\n";
    };
    
    run_async(s, on_complete, on_error)(run_both());
    
    done.wait();  // Never completes
    return 0;
}

Output:

Before when_all

(Program hangs indefinitely)

Note: A simple task without when_all works correctly:

run_async(s, on_complete, on_error)(task_a());  // Works fine

Background

What is Capy?

Boost.Capy is a C++20 coroutine library providing:

  • task<T>: A lazy coroutine type (doesn't start until awaited)
  • Executors: Objects that schedule and run coroutines (thread_pool, strand, io_context)
  • Concurrency primitives: when_all for parallel execution, async_event for signaling
  • Type-erased wrappers: executor_ref and any_executor for runtime polymorphism

Boost.Corosio is a companion library providing I/O primitives (sockets, timers) that integrate with Capy.

Key Concepts

Executors

An executor is an object that can schedule coroutines for execution. In Capy, executors provide two key methods:

void post(coro h);      // Queue coroutine for later execution
coro dispatch(coro h);  // Execute now if possible, else queue

The dispatch() method is an optimization: if the caller is already running on this executor's thread, it can resume the coroutine immediately instead of queuing it.

Type Erasure with executor_ref

executor_ref is a lightweight, non-owning wrapper that can hold any executor type. It uses a vtable (virtual function table) for runtime polymorphism without inheritance:

// Can wrap any executor type
void schedule_work(executor_ref ex) {
    ex.dispatch(some_coroutine);  // Works with any executor
}

thread_pool pool;
strand s{pool.get_executor()};

schedule_work(pool.get_executor());  // Works
schedule_work(s);                     // Works (but has the bug!)

Symmetric Transfer

Symmetric transfer is a C++20 coroutine optimization that avoids stack growth when switching between coroutines. Instead of one coroutine calling another (which adds a stack frame), coroutines "transfer" control directly via std::coroutine_handle.

// WITHOUT symmetric transfer (stack grows):
coro await_suspend(coro h) {
    next_coroutine.resume();  // Adds stack frame
    return std::noop_coroutine();
}

// WITH symmetric transfer (stack stays flat):
coro await_suspend(coro h) {
    return next_coroutine;  // Caller resumes this handle directly
}

The returned handle tells the coroutine machinery which coroutine to resume next. Returning std::noop_coroutine() means "I've handled it, don't resume anything."

Strand

A strand serializes execution: coroutines dispatched through a strand never run concurrently, even on a multi-threaded executor. This is useful for protecting shared state without explicit locking.

thread_pool pool;
strand s{pool.get_executor()};

// These will never run simultaneously, even though pool has multiple threads
run_async(s)(task_a());
run_async(s)(task_b());

Root Cause Analysis

The Two Dispatch APIs

The bug stems from a mismatch between how strand and executor_ref define their dispatch() methods.

strand::dispatch() — Returns coro for Symmetric Transfer

// strand.hpp
coro dispatch(coro h) const
{
    return detail::strand_service::dispatch(*impl_, executor_ref(ex_), h);
}

// strand_service.cpp
coro strand_service::dispatch(strand_impl& impl, executor_ref ex, coro h)
{
    // Optimization: if we're already running in this strand, 
    // return the handle for immediate symmetric transfer
    if (running_in_this_thread(impl))
        return h;  // ← Caller is expected to resume this!
    
    // Otherwise, queue the coroutine and start the invoker
    if (strand_service_impl::enqueue(impl, h))
        ex.post(strand_service_impl::make_invoker(impl).h_);
    
    return std::noop_coroutine();  // Caller does nothing
}

When dispatch() is called from within the strand (i.e., running_in_this_thread() is true), it returns the coroutine handle h directly. The caller is expected to resume this handle via symmetric transfer.

executor_ref::dispatch() — Returns void, Ignores Return Value

// executor_ref.hpp
void dispatch(coro h) const
{
    vt_->dispatch(ex_, h);  // Calls strand::dispatch(), IGNORES return value!
}

// The vtable entry for dispatch:
static constexpr executor_vtable vtable_for = {
    // ...
    // dispatch lambda - note it returns void
    [](void const* p, std::coroutine_handle<> h) {
        static_cast<Ex const*>(p)->dispatch(h);  // Return value discarded!
    },
    // ...
};

The type-erased executor_ref calls the underlying executor's dispatch() but discards the return value. When wrapping a strand, this means the handle returned for symmetric transfer is lost.

Why thread_pool Works

thread_pool::executor_type::dispatch() always queues work and returns noop_coroutine():

// thread_pool.hpp
coro dispatch(coro h) const
{
    post(h);                        // Always queue, never inline
    return std::noop_coroutine();   // "I handled it, nothing for caller to do"
}

Since it always returns noop_coroutine(), ignoring the return value is harmless.

Why strand Fails

When strand::dispatch() is called from within the strand's invoker thread:

  1. running_in_this_thread() returns true (we're inside the strand)
  2. strand::dispatch() returns h directly (expecting the caller to resume it)
  3. executor_ref::dispatch() ignores this return value
  4. The coroutine handle h is never resumed
  5. Deadlock: the coroutine waits forever

Detailed Execution Trace

1.  run_async(strand, ...) is called with run_both() task
2.  strand::dispatch() is called from main thread (NOT in strand)
    └─ running_in_this_thread() == false
    └─ Coroutine is enqueued
    └─ Strand invoker is posted to thread_pool
    └─ Returns noop_coroutine() ✓

3.  Thread pool worker picks up strand invoker
4.  Invoker sets dispatch_thread_ = current thread ID
5.  Invoker dispatches pending coroutines (including run_both)

6.  run_both() starts executing
7.  run_both() calls: co_await when_all(task_a(), task_b())

8.  when_all creates runner coroutines for task_a and task_b
9.  when_all calls: executor_ref::dispatch(runner_0)
    └─ executor_ref wraps the strand
    └─ Calls strand::dispatch(runner_0)
    └─ running_in_this_thread() == TRUE (we're in the invoker!)
    └─ strand::dispatch() returns runner_0 handle
    └─ executor_ref::dispatch() IGNORES this return value ✗
    └─ runner_0 is NEVER resumed!

10. Same happens for runner_1

11. Neither runner executes
    └─ when_all's completion counter never reaches zero
    └─ when_all waits forever
    └─ DEADLOCK

Affected Code Paths

Any code that:

  1. Runs on a strand, AND
  2. Dispatches work through executor_ref while inside that strand's context

This includes:

  • when_all launching child tasks (uses executor_ref::dispatch)
  • io_awaitable_support::complete() dispatching continuations
  • Any user code calling executor_ref::dispatch() from within a strand

Potential Solutions

Option 1: Change executor_ref::dispatch to Return coro (Recommended)

Change:

// executor_ref.hpp - BEFORE
void dispatch(coro h) const
{
    vt_->dispatch(ex_, h);
}

// executor_ref.hpp - AFTER  
coro dispatch(coro h) const
{
    return vt_->dispatch(ex_, h);
}

// vtable - BEFORE
void (*dispatch)(void const*, std::coroutine_handle<>);

// vtable - AFTER
coro (*dispatch)(void const*, std::coroutine_handle<>);

// vtable lambda - AFTER
[](void const* p, std::coroutine_handle<> h) -> coro {
    return static_cast<Ex const*>(p)->dispatch(h);
},

Analysis:

  • Correct semantic: dispatch can return a handle for symmetric transfer
  • Callers of executor_ref::dispatch() must handle the return value
  • Aligns with how concrete executor types (strand, thread_pool) already work
  • Preserves symmetric transfer optimization

Option 2: Change strand::dispatch to Never Rely on Symmetric Transfer

Change strand to always enqueue, even when in-thread:

coro strand_service::dispatch(strand_impl& impl, executor_ref ex, coro h)
{
    // Remove the running_in_this_thread optimization entirely
    if (strand_service_impl::enqueue(impl, h))
        ex.post(strand_service_impl::make_invoker(impl).h_);
    
    return std::noop_coroutine();
}

Analysis:

  • Simple fix
  • Performance regression: loses inline execution when already in strand
  • Every dispatch from within a strand now goes through the queue
  • Defeats the purpose of the running_in_this_thread optimization

Option 3: Strand Resumes Inline Without Symmetric Transfer

Change strand to call resume() directly instead of returning the handle:

coro strand_service::dispatch(strand_impl& impl, executor_ref ex, coro h)
{
    if (running_in_this_thread(impl))
    {
        h.resume();  // Resume immediately, don't use symmetric transfer
        return std::noop_coroutine();
    }
    
    if (strand_service_impl::enqueue(impl, h))
        ex.post(strand_service_impl::make_invoker(impl).h_);
    
    return std::noop_coroutine();
}

Analysis:

  • Preserves inline execution optimization
  • Stack depth increases: each nested dispatch adds a stack frame
  • Risk of stack overflow with deeply nested coroutine chains
  • This is how io_context currently works (see table below)

Option 4: Vtable Dispatch Resumes Returned Handle Internally

Change vtable to handle symmetric transfer transparently:

// vtable dispatch wrapper
[](void const* p, std::coroutine_handle<> h) {
    auto result = static_cast<Ex const*>(p)->dispatch(h);
    if (result && result != std::noop_coroutine())
        result.resume();  // Transparently handle symmetric transfer
},

Analysis:

  • No change to executor_ref public API
  • Hidden behavior makes debugging harder
  • Performance overhead: checks return value on every dispatch
  • Stack depth issues (same as Option 3)

Recommendation

Option 1 (change executor_ref::dispatch to return coro) is the most correct solution. The current void return type is fundamentally incompatible with executors that support symmetric transfer.

This aligns executor_ref with how concrete executor types already define their dispatch() methods—both strand and thread_pool return coro.

Option 3 (resume inline) could be considered if there's a strong reason to keep executor_ref::dispatch() returning void, but it sacrifices symmetric transfer's stack efficiency.


Executor Types in Codebase

Executor dispatch() Returns In-Thread Behavior Works with executor_ref?
thread_pool::executor_type coro Returns noop_coroutine() (always queues) Yes
strand<Ex> coro Returns h for symmetric transfer No (BUG)
basic_io_context::executor_type void Calls h.resume() directly Yes
test::run_blocking::executor_type void Calls h.resume() directly Yes
mock_executor (test helper) void Calls h.resume() directly Yes
executor_ref void Calls wrapped dispatch, ignores return N/A (is the wrapper)
any_executor void Calls wrapped dispatch, ignores return N/A (is a wrapper)

Observations

  1. Inconsistent return types: Some executors return coro (for symmetric transfer), others return void (handle inline execution internally by calling resume()).

  2. strand is unique: It's the only executor that returns a non-noop handle from dispatch() for symmetric transfer optimization.

  3. io_context avoids the issue: basic_io_context::executor_type::dispatch() returns void and handles inline execution internally via h.resume(). This works with executor_ref but loses symmetric transfer benefits.

  4. any_executor has the same bug: Like executor_ref, it also uses a vtable that ignores the return value.

Design Question

Should all executors in Capy:

  • (A) Return coro from dispatch() to support symmetric transfer? (Requires fixing executor_ref and any_executor)
  • (B) Return void and handle inline execution internally via h.resume()? (Requires changing strand and thread_pool)

Option A preserves symmetric transfer's stack efficiency. Option B is simpler but loses that optimization.


Related Files

File Description
include/boost/capy/ex/executor_ref.hpp Type-erased non-owning executor wrapper (has the bug)
include/boost/capy/ex/any_executor.hpp Type-erased owning executor wrapper (has the same bug)
include/boost/capy/ex/strand.hpp Strand executor adaptor
src/ex/detail/strand_service.cpp Strand dispatch implementation
include/boost/capy/ex/thread_pool.hpp Thread pool executor
include/boost/capy/when_all.hpp Uses executor_ref::dispatch for child tasks
include/boost/capy/ex/io_awaitable_support.hpp Uses executor_ref::dispatch in complete()
include/boost/corosio/basic_io_context.hpp I/O context executor (returns void, calls resume internally)

Test Case

After fixing, this should work:

#include <boost/capy.hpp>
#include <boost/capy/ex/strand.hpp>
#include <latch>

using namespace boost::capy;

int main()
{
    thread_pool pool;
    strand s{pool.get_executor()};

    auto outer = [&]() -> task<> {
        co_await when_all(
            []() -> task<> { co_return; }(),
            []() -> task<> { co_return; }()
        );
    };

    std::latch done(1);
    run_async(s, 
        [&](auto&&...) { done.count_down(); },  // on_complete
        [&](auto) { done.count_down(); }        // on_error
    )(outer());
    
    done.wait();  // Should complete, not hang
    return 0;
}

Glossary

Term Definition
coro Alias for std::coroutine_handle<> — a type-erased handle to any coroutine
Symmetric transfer C++20 optimization where await_suspend returns a coroutine handle for the runtime to resume, avoiding stack growth
noop_coroutine A special coroutine handle that does nothing when resumed; returned to indicate "no transfer needed"
Strand An executor wrapper that serializes execution — work dispatched through it never runs concurrently
Type erasure A technique for runtime polymorphism without inheritance, typically using function pointers or vtables
vtable Virtual function table — a struct of function pointers used for type erasure
executor_ref A non-owning type-erased wrapper for any Capy executor
when_all A primitive that runs multiple tasks concurrently and waits for all to complete

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions