Skip to content

test_mcp: 'issue-148: codedb mcp exits when stdin is closed' flakes with EOF under concurrent suite load #620

@justrach

Description

@justrach

Problem

test "issue-148: codedb mcp exits when stdin is closed" (src/test_mcp.zig) is flaky when the split test binaries run concurrently under load — observed as an EOF failure during full-suite runs and noted as "Flaky-under-load: test_mcp issue-148 (EOF)" in the session todo since at least the 0.2.5825 cycle.

The test spawns a real child process (zig build run -- --mcp), writes an initialize frame, closes stdin, and waits for exit. Under suite-wide load there are several timing hazards:

  • zig build run inside the test competes with the rest of the suite for the build lock / CPU, so spawn-to-ready time is unbounded; the write of the initialize frame can race the child's startup and hit a closed pipe (EOF/EPIPE).
  • The child's exit deadline is wall-clock based while the host is saturated by the other test binaries.
  • The spawn-failure path silently returns (skips), so the flake only manifests on the write/wait side, making it look intermittent.

Failing test

The failing test already exists — it is this test itself. Reproduce by running the suite with all binaries concurrently (the same condition bd755c0 worked around for the perf-threshold tests with min-of-3 timing):

zig build test --summary all   # repeat under load; test_mcp issue-148 fails with EOF intermittently

Expected

The test passes deterministically regardless of host load.

Fix

Options, in increasing order of invasiveness:

  1. Spawn the already-built zig-out/bin/codedb binary instead of zig build run (removes the build-lock contention entirely), tolerate EPIPE on the initialize write (the point of the test is exit-on-EOF, not the handshake), and scale the wait deadline.
  2. Serialize this one test against the rest of the suite (own step in build.zig).
  3. Replace the process-level integration test with a transport-level test against the read loop's EOF path (the other issue-148 tests already cover the poll/HUP mechanics).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpriority:p2Medium priority

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions