Avoid search-thread deadlocks during surface teardown#55
Conversation
📝 WalkthroughWalkthroughSearch thread callback handling is hardened during Surface teardown by introducing an atomic flag and dynamic timeout values that prevent blocking operations when the app thread synchronously joins. Helpers centralize mailbox push logic, deinit sets the teardown flag before thread shutdown, and all search event handlers now use best-effort timeouts during teardown instead of bounded waits. ChangesSearch thread teardown safety
Estimated Code Review Effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/Surface.zig (1)
1499-1513: ⚡ Quick winLog dropped search messages when not tearing down.
These helpers silently drop timed-out pushes, which makes intermittent stale search UI hard to triage in normal operation. Please emit a debug/warn when
push == 0and teardown is not active.Suggested patch
fn pushSearchRendererMessage( self: *Surface, message: rendererpkg.Message, timeout: rendererpkg.Thread.Mailbox.Timeout, ) void { - _ = self.renderer_thread.mailbox.push(message, timeout); + const pushed = self.renderer_thread.mailbox.push(message, timeout); + if (pushed == 0 and !self.search_tearing_down.load(.acquire)) { + log.warn("dropping search renderer message due to mailbox backpressure", .{}); + } } @@ fn pushSearchSurfaceMessage( self: *Surface, message: Message, timeout: App.Mailbox.Queue.Timeout, ) void { - _ = self.surfaceMailbox().push(message, timeout); + const pushed = self.surfaceMailbox().push(message, timeout); + if (pushed == 0 and !self.search_tearing_down.load(.acquire)) { + log.warn("dropping search surface message due to mailbox backpressure", .{}); + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/Surface.zig` around lines 1499 - 1513, The pushSearchRendererMessage and pushSearchSurfaceMessage helpers currently drop timed-out pushes silently; update both functions to check the return value of mailbox.push and, if it equals 0 and the surface is not in teardown (e.g., check self.tearing_down / self.is_tearing_down / self.tearingDown flag), emit a debug or warn log including context (which helper, message type/id, and timeout) via the Surface logger (e.g., self.logger.warn/self.logger.debug). Ensure the log is only emitted when not tearing down so normal shutdowns remain quiet.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/Surface.zig`:
- Around line 1499-1513: The pushSearchRendererMessage and
pushSearchSurfaceMessage helpers currently drop timed-out pushes silently;
update both functions to check the return value of mailbox.push and, if it
equals 0 and the surface is not in teardown (e.g., check self.tearing_down /
self.is_tearing_down / self.tearingDown flag), emit a debug or warn log
including context (which helper, message type/id, and timeout) via the Surface
logger (e.g., self.logger.warn/self.logger.debug). Ensure the log is only
emitted when not tearing down so normal shutdowns remain quiet.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: db98b99fc3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ); | ||
| try self.renderer_thread.wakeup.notify(); | ||
| self.pushSearchRendererMessage(.{ .search_selected_match = null }, renderer_timeout); | ||
| self.pushSearchRendererViewportMatches(.init(self.alloc), &.{}, renderer_timeout); |
There was a problem hiding this comment.
Preserve the final search reset outside teardown
When search is closed normally (not during surface teardown), is_tearing_down is false, so this quit reset is still sent with the 50 ms .ns timeout; if the renderer mailbox remains full longer than that, BlockingQueue.push returns 0 and pushSearchRendererViewportMatches drops the reset. Because .quit is the last search callback, there is no later update to clear renderer.search_matches/search_selected_match, so stale search highlights can remain visible after the user closes search under renderer backpressure. The teardown path can be best-effort, but the non-teardown quit path needs a reliable way to clear this state.
Useful? React with 👍 / 👎.
Greptile SummaryThis PR fixes a deadlock that occurs when a terminal surface is torn down while the search thread is active:
Confidence Score: 4/5Safe to merge for the deadlock fix itself; the introduced wakeup-after-dropped-message and dangling-errdefer patterns are minor issues that won't cause runtime failures today. The core threading fix is well-reasoned: the atomic store happens-before the join, the two specialised push helpers free their arenas on failure, and the bounded 50 ms wait eliminates the infinite-block path. The two concerns flagged — unconditional wakeup.notify() when a push was dropped, and the errdefer remaining live after value-based arena transfer — are both currently harmless but represent structural fragility that could bite on future changes to the callback body. src/Surface.zig — specifically the searchCallback_ function and the four new push-helper wrappers; the arena ownership handoff and the post-push wakeup pattern deserve a second look. Important Files Changed
Sequence DiagramsequenceDiagram
participant AT as App Thread (deinit)
participant ST as Search Thread (callback)
participant RM as Renderer Mailbox
participant RT as Renderer Thread
Note over AT,RT: Normal operation
ST->>ST: searchCallback_() called
ST->>ST: load search_tearing_down false
ST->>ST: "timeout = .{ .ns = 50ms }"
ST->>RM: push(message, 50ms timeout)
RM-->>ST: "pushed >= 1 (success)"
ST->>RT: wakeup.notify()
Note over AT,RT: Teardown path (fix)
AT->>AT: search_tearing_down.store(true, .release)
AT->>ST: s.deinit() signals thread to stop
Note over AT: join() waiting here
ST->>ST: load search_tearing_down true
ST->>ST: "timeout = .{ .instant = {} }"
ST->>RM: push(message, instant)
alt mailbox full
RM-->>ST: "pushed = 0 (fail fast)"
ST->>ST: arena.deinit() (no leak)
else mailbox not full
RM-->>ST: "pushed >= 1"
end
ST->>RT: wakeup.notify() (unconditional)
ST->>ST: callback returns, thread exits
AT->>AT: join() returns
AT->>RT: renderer_thread.stop.notify()
AT->>RT: renderer_thr.join()
Reviews (1): Last reviewed commit: "Avoid search callback deadlocks during s..." | Re-trigger Greptile |
| self.renderer_thread.wakeup.notify() catch |err| | ||
| log.warn("error notifying renderer thread after search viewport update err={}", .{err}); |
There was a problem hiding this comment.
Unconditional wakeup after a potentially-dropped message
wakeup.notify() is called regardless of whether the preceding push succeeded. When the push times out or returns 0 (message dropped), the renderer thread is woken up with no new message in the mailbox. The renderer will drain its queue and go back to sleep harmlessly, but the wakeup signal is semantically misleading — its contract is "a new message is waiting." Under heavy load this path can fire repeatedly without effect. Consider only calling wakeup.notify() when the push returned a non-zero count.
| @@ -1463,14 +1542,9 @@ fn searchCallback_( | |||
| const matches = try alloc.dupe(terminal.highlight.Flattened, matches_unowned); | |||
| for (matches) |*m| m.* = try m.clone(alloc); | |||
|
|
|||
| _ = self.renderer_thread.mailbox.push( | |||
| .{ .search_viewport_matches = .{ | |||
| .arena = arena, | |||
| .matches = matches, | |||
| } }, | |||
| .forever, | |||
| ); | |||
| try self.renderer_thread.wakeup.notify(); | |||
| self.pushSearchRendererViewportMatches(arena, matches, renderer_timeout); | |||
| self.renderer_thread.wakeup.notify() catch |err| | |||
| log.warn("error notifying renderer thread after search viewport update err={}", .{err}); | |||
| }, | |||
There was a problem hiding this comment.
Dangling
errdefer after arena ownership transfer by value
arena is passed by value to pushSearchRendererViewportMatches. Inside that helper, when the push fails, the helper's copy calls arena.deinit(), freeing the underlying memory pool. The caller's arena variable (still on the stack with errdefer arena.deinit() attached) now holds dangling internal pointers. If a try-able call were ever added after the push helper — or if the helper's ownership model changes — the errdefer would trigger a double-free. Today it is safe because no errors propagate after the push, but it is a fragile ownership pattern. The same applies to the selected_match branch. Consider removing or cancelling the errdefer after transferring ownership, or returning the arena back from the helper so the caller retains exclusive ownership.
| fn pushSearchRendererMessage( | ||
| self: *Surface, | ||
| message: rendererpkg.Message, | ||
| timeout: rendererpkg.Thread.Mailbox.Timeout, | ||
| ) void { | ||
| _ = self.renderer_thread.mailbox.push(message, timeout); | ||
| } |
There was a problem hiding this comment.
pushSearchRendererMessage silently discards non-arena message drops
Unlike pushSearchRendererViewportMatches and pushSearchRendererSelectedMatch, this helper ignores the push return value without comment. Currently it is only called with { .search_selected_match = null } — a message with no owned arena — so there is nothing to free on a failed push. However, the asymmetry with the other two helpers (which explicitly handle the pushed == 0 case) makes it easy to accidentally pass an arena-bearing message here in the future. A brief inline comment explaining that this path is intentionally arena-free would prevent misuse.
Summary
This is a proposed fix for a freeze I have been hitting frequently in cmux when a terminal is closed while Ghostty search/find state is active or recently active.
The fix itself is entirely AI-suggested. I am opening this as a report plus a proposed patch, not as a claim that I personally proved every detail of the Ghostty-side threading model.
Observed failure
A sampled hung cmux process showed the macOS app main thread blocked in
ghostty_surface_free, inside GhosttySurface.deinit, waiting inpthread_joinfor the per-surface search thread to exit.The suspected deadlock is:
Surface.deinitasks the search thread to stop and synchronously joins it..foreverqueue pushes into renderer/app mailboxes.join, both sides can wait forever.This matches the user-visible symptom: the cmux window becomes unresponsive and only shows the spinner.
Proposed fix
This patch removes unbounded waits from search-thread callback delivery:
Search UI updates are stale/recoverable state. During teardown the surface is closing, so dropping final reset updates is preferable to deadlocking the app.
Validation done locally
zig fmt --check ghostty/src/Surface.zigzig build -Demit-xcframework=true -Demit-macos-app=false -Dxcframework-target=universal -Doptimize=ReleaseFast./scripts/reload.sh --tag fix-search-shutdown-hangCaveat
Again: the diagnosis and fix are AI-assisted. The sample and observed freeze are real, but this should be reviewed carefully by someone familiar with Ghostty's search/thread/mailbox shutdown behavior.
Summary by cubic
Fixes a hang when closing a surface while search is active by removing unbounded waits in search-thread callbacks. Search updates now use short timeouts and switch to instant/best-effort during teardown.
search_tearing_downatomic flag set inSurface.deinitbefore joining the search thread..forevermailbox pushes with bounded waits (search_callback_push_timeout_ns= 50ms); use instant timeouts during teardown.Written for commit db98b99. Summary will update on new commits.
Summary by CodeRabbit