Skip to content

Go child process: recover() for nil-pointer dereference hangs indefinitely under ACP (macOS / darwin) #712

@lurodrigo

Description

@lurodrigo

Found this really intricate bug while debugging a test suite and found a reproducible example:

Summary

A child Go process spawned via the Bash tool under @agentclientprotocol/claude-agent-acp hangs indefinitely on a nil-pointer dereference instead of converting it to a Go panic that recover() can catch. The same Go program, run by the same claude binary outside of ACP (standalone CLI), recovers correctly. This affects any go test package that exercises nil-deref behavior (e.g. via require.Panics), causing the entire package to consume the -timeout duration.

Platform scope

Tested on macOS 26.5 arm64 only. The proposed mechanism (Mach EXC_BAD_ACCESS delivery being intercepted by an intermediate process) is Darwin-specific — Linux delivers SIGSEGV directly via the kernel, so the bug very likely does not reproduce there. If anyone can confirm or deny on Linux/WSL it would help narrow scope.

Environment

  • Platform: macOS 26.5 (Darwin 25.5.0), arm64 (Apple Silicon) ← affected
  • Linux: not tested — likely unaffected based on hypothesis (different signal-delivery mechanism)
  • Host: Zed 1.3.7
  • Node: v22.22.3
  • @agentclientprotocol/claude-agent-acp: 0.37.0
  • @anthropic-ai/claude-agent-sdk: 0.3.146
  • Bundled claude Code binary: 2.1.146
  • Standalone claude Code binary (for comparison): 2.1.150
  • Go: 1.26.1 (also reproduced with 1.25.2, 1.24.4, 1.23.4, 1.23.0)

Reproducer

17 lines, no dependencies.

main.go:

package main

import "fmt"

type S struct{ x int }

func main() {
	defer func() {
		if r := recover(); r != nil {
			fmt.Println("RECOVERED:", r)
			return
		}
		fmt.Println("NO PANIC")
	}()
	var p *S
	fmt.Println("about to deref...")
	_ = p.x
	fmt.Println("post-deref")
}

go.mod:

module niltest

go 1.21

Run via the Bash tool in an ACP-hosted Claude session:

timeout 5 go run main.go

Expected

Observed in a standalone claude CLI session on the same machine, same Go binary:

about to deref...
RECOVERED: runtime error: invalid memory address or nil pointer dereference

Exit 0, sub-second wall time.

Actual

Observed in Zed + ACP, three runs in a row:

about to deref...

Then the process hangs until timeout kills it. Exit 124, ~5s wall time. The RECOVERED: line is never printed.

Variability matrix

Setup Result
Standalone claude CLI (2.1.150) recovers ✓
Zed + ACP 0.10.0 + bundled Claude 2.1.146 hangs
Zed + ACP 0.10.0 + CLAUDE_CODE_EXECUTABLE=2.1.150 override hangs
Zed + ACP 0.37.0 + bundled Claude 2.1.146 hangs

The ACP layer is the consistent variable; the Claude binary version is not.

Diagnostic evidence

GODEBUG=schedtrace=500 taken during the hang:

SCHED 500ms: gomaxprocs=11 idleprocs=11 threads=18 idlethreads=12 runqueue=0
SCHED 1000ms: gomaxprocs=11 idleprocs=11 threads=18 idlethreads=12 runqueue=0
SCHED 1500ms: gomaxprocs=11 idleprocs=11 threads=18 idlethreads=12 runqueue=0
...

All processors idle, runqueue empty — the goroutine has vanished from Go's scheduler entirely while the process is wall-clock-stuck. Goroutine dump at timeout shows the offending goroutine as runnable with the note goroutine running on other thread; stack unavailable.

GODEBUG / runtime knobs that don't help (still hangs):

  • GODEBUG=asyncpreemptoff=1
  • GODEBUG=crashonfault=1
  • GOMAXPROCS=1
  • GOTRACEBACK=crash

Stdin redirection (< /dev/null) and using a precompiled binary instead of go run also do not help — ruling out duplication of anthropics/claude-code#13127 (Go runtime stdin-init named-pipe issue).

Hypothesis

Consistent with the ACP adapter's Node parent process inheriting / forwarding macOS Mach exception ports such that EXC_BAD_ACCESS from a grandchild Go process is intercepted before Go's signal handler can convert it to SIGSEGV (and then to a Go panic). The standalone CLI has no Node parent in the chain, so the exception path works there. Linux uses direct kernel SIGSEGV delivery rather than Mach exceptions, so this specific failure mode is unlikely to apply there — but I have not confirmed.

Impact

Any Go test suite exercising nil-deref behavior burns the full go test -timeout per affected package (10 minutes default) when run from an ACP-hosted Claude session on macOS. The same suite runs in seconds outside ACP. This blocks AI-assisted workflows that drive go test through this agent — agent loops iterating on backend Go codebases lose ~10 minutes per iteration to a hang that has nothing to do with the code being worked on.

Not a duplicate of anthropics/claude-code#13127

That issue (Go binaries hang 30–60 seconds in Bash tool on macOS, stdin named-pipe) is closed and documents a workaround of redirecting stdin to /dev/null. The reproducer here:

  • Still hangs with < /dev/null
  • Still hangs with a precompiled binary (no go run involved, so no Go-toolchain stdin path)
  • Hangs for the full timeout, not 30–60 seconds

So this is a separate failure mode at the signal/exception-delivery layer, not at Go runtime stdin initialization.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions