fix(hermes): runner concurrency + bridge restart in talk#187
Merged
Conversation
Two bundled fixes — they share the same plumbing. #20 — Runner had multiple goroutines calling scanner.Scan on the same bufio.Scanner. Under -race that's an immediate data race; in production it manifested as cross-talk between concurrent prompts and silent loss of the final response when the bridge died mid-call. Replace the ad-hoc reader with a single dispatcher goroutine that owns the scanner. Responses route to the awaiting goroutine via a per-ID inbox map; notifications go to the currently-active prompt's sink. Prompts are serialised by promptMu (the wire protocol doesn't tag notifications with a request ID, so overlapping prompts can't be demuxed). Pending callers unblock with ErrBridgeExited when the dispatcher sees EOF — previously they hung indefinitely. Prompt now drains queued notifications before emitting the final event so deltas never get reordered behind the final when both channels are buffered. #21 — talk's REPL now restarts the bridge transparently when it dies mid-session, capped at 3 restarts per minute. Before, every prompt after the first crash just printed "bridge process exited" forever. A small stderrTail ring buffer captures the last 4 KiB of bridge stderr so the error message includes the actual Python traceback (usually a ModuleNotFoundError) rather than the bare exit string. Tests cover ID routing, ordered delta-then-final delivery, prompt serialisation under concurrent callers, bridge-death unblock, and a flood scenario that exercises the dispatcher's drop-on-full notif sink semantics. All pass under -race. Closes #20 Closes #21
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bundled fix for #20 (hermes Runner scanner data race + lost notifications) and #21 (
clanker talkdoesn't recover when the bridge crashes). They share the same plumbing — the dispatcher'sdonesignal is exactly where bridge-death detection lives, so splitting them would create surface-area for the seam to drift.#20 — Runner dispatcher
bufio.Scanner. No other goroutine touches it. (Pre-fix:call()andPrompt()both calledScan()on the shared scanner → race + cross-talk.)inboxmap; notifications go to the currently-active prompt's sink.promptMuserialises Prompt() calls — the wire protocol doesn't tag notifications with a request ID, so overlapping prompts can't be demuxed safely.ErrBridgeExited+IsBridgeExitErrorso callers can distinguish "bridge died" from "you asked a bad question."#21 — Bridge restart in
talkcmd/talkREPL retries one prompt after restarting the bridge onErrBridgeExited.ringBuffercaptures last 4 KiB of bridge stderr — when the restart message surfaces it includes the trailing 3 lines (typically the Python traceback'sModuleNotFoundError: ...line) instead of just "bridge process exited."Test plan
go test -race -count=1 ./internal/hermes/...— passesgo vet ./...— cleangofmt -d— cleanErrBridgeExitedIsBridgeExitErrorrecognises wrapped + joined errorsCloses #20
Closes #21