fix(shim): optimize monitor to avoid intermittent hangs#38
fix(shim): optimize monitor to avoid intermittent hangs#38novahe wants to merge 1 commit intokuasar-io:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness and reliability of the Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
ee587e6 to
5bc9053
Compare
There was a problem hiding this comment.
Code Review
This pull request effectively addresses potential deadlocks and hangs in the monitor module by switching to std::sync::Mutex and unbounded channels. This is a solid improvement for the shim's stability. The addition of a Drop implementation for Subscription to handle automatic unsubscription is a great feature for resource management. The test suite has also been significantly enhanced with table-driven tests, backpressure tests, and reliability tests, which increases confidence in these critical changes. I have a couple of suggestions: one to fix a resource leak in the synchronous monitor path by adding cleanup for disconnected subscribers, and another to improve the efficiency of this cleanup logic in the asynchronous path.
8a3cc2b to
f58174c
Compare
This commit addresses several deadlock and hang scenarios in the monitor module: - Replaced tokio::sync::Mutex with std::sync::Mutex to allow safe Drop implementation. - Switched to unbounded channels to prevent Reaper thread blocking on subscriber backpressure. - Restored in-lock notification to ensure strict FIFO ordering and unsubscription consistency. Signed-off-by: novahe <heqianfly@gmail.com>
f58174c to
d0d537a
Compare
Description
This PR addresses intermittent hangs and potential deadlocks in the containerd-shim monitor module by optimizing the locking mechanism and event distribution logic.
The Problem
The previous implementation suffered from two main issues:
runtime (e.g., during shim shutdown or from a blocking thread), it caused panics or hung the process.
other process exits from being collected, leading to zombie processes and a total shim hang under heavy workloads.
The Solution
execution context without relying on the async runtime.
unbounded channels ensure that the producer (the Reaper thread) is never blocked by subscriber backpressure.
ordering of events and providing a guarantee that no events are delivered after a successful unsubscription.