Problem
test "issue-148: codedb mcp exits when stdin is closed" (src/test_mcp.zig) is flaky when the split test binaries run concurrently under load — observed as an EOF failure during full-suite runs and noted as "Flaky-under-load: test_mcp issue-148 (EOF)" in the session todo since at least the 0.2.5825 cycle.
The test spawns a real child process (zig build run -- --mcp), writes an initialize frame, closes stdin, and waits for exit. Under suite-wide load there are several timing hazards:
zig build run inside the test competes with the rest of the suite for the build lock / CPU, so spawn-to-ready time is unbounded; the write of the initialize frame can race the child's startup and hit a closed pipe (EOF/EPIPE).
- The child's exit deadline is wall-clock based while the host is saturated by the other test binaries.
- The spawn-failure path silently
returns (skips), so the flake only manifests on the write/wait side, making it look intermittent.
Failing test
The failing test already exists — it is this test itself. Reproduce by running the suite with all binaries concurrently (the same condition bd755c0 worked around for the perf-threshold tests with min-of-3 timing):
zig build test --summary all # repeat under load; test_mcp issue-148 fails with EOF intermittently
Expected
The test passes deterministically regardless of host load.
Fix
Options, in increasing order of invasiveness:
- Spawn the already-built
zig-out/bin/codedb binary instead of zig build run (removes the build-lock contention entirely), tolerate EPIPE on the initialize write (the point of the test is exit-on-EOF, not the handshake), and scale the wait deadline.
- Serialize this one test against the rest of the suite (own step in build.zig).
- Replace the process-level integration test with a transport-level test against the read loop's EOF path (the other issue-148 tests already cover the poll/HUP mechanics).
Problem
test "issue-148: codedb mcp exits when stdin is closed"(src/test_mcp.zig) is flaky when the split test binaries run concurrently under load — observed as an EOF failure during full-suite runs and noted as "Flaky-under-load: test_mcp issue-148 (EOF)" in the session todo since at least the 0.2.5825 cycle.The test spawns a real child process (
zig build run -- --mcp), writes aninitializeframe, closes stdin, and waits for exit. Under suite-wide load there are several timing hazards:zig build runinside the test competes with the rest of the suite for the build lock / CPU, so spawn-to-ready time is unbounded; the write of the initialize frame can race the child's startup and hit a closed pipe (EOF/EPIPE).returns (skips), so the flake only manifests on the write/wait side, making it look intermittent.Failing test
The failing test already exists — it is this test itself. Reproduce by running the suite with all binaries concurrently (the same condition
bd755c0worked around for the perf-threshold tests with min-of-3 timing):Expected
The test passes deterministically regardless of host load.
Fix
Options, in increasing order of invasiveness:
zig-out/bin/codedbbinary instead ofzig build run(removes the build-lock contention entirely), tolerate EPIPE on the initialize write (the point of the test is exit-on-EOF, not the handshake), and scale the wait deadline.