fix(shim): resolve intermittent deadlocks and satisfy Rust 1.85 checks#39
fix(shim): resolve intermittent deadlocks and satisfy Rust 1.85 checks#39novahe wants to merge 2 commits intokuasar-io:mainfrom
Conversation
This commit addresses several deadlock and hang scenarios in the monitor module: - Replaced tokio::sync::Mutex with std::sync::Mutex to allow safe Drop implementation. - Switched to unbounded channels to prevent Reaper thread blocking on subscriber backpressure. - Restored in-lock notification to ensure strict FIFO ordering and unsubscription consistency. Signed-off-by: novahe <heqianfly@gmail.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the stability and maintainability of the shim by addressing critical deadlock issues within the monitor module and performing a comprehensive upgrade to Rust 1.85. The changes ensure more robust process monitoring, prevent potential hangs, and align the codebase with the latest Rust best practices and static analysis checks. Highlights
Ignored Files
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces significant improvements to the shim's monitor module to resolve deadlocks and hangs, along with upgrading the Rust toolchain and addressing various static analysis warnings. The switch to std::sync::Mutex and unbounded channels is a solid approach to improve robustness. The introduction of a Drop implementation for Subscription is a key enhancement for automatic resource cleanup. I've added a suggestion to improve the efficiency of removing dead subscribers. Overall, this is a high-quality update that enhances both the stability and maintainability of the codebase.
| if let Some(subs) = self.topic_subs.get(topic) { | ||
| let subscribers = subs.iter().filter_map(|x| self.subscribers.get(x)); | ||
| for sub in subscribers { | ||
| let res = sub | ||
| .tx | ||
| .send(ExitEvent { | ||
| for i in subs { | ||
| if let Some(sub) = self.subscribers.get(i) { | ||
| if let Err(e) = sub.tx.send(ExitEvent { | ||
| subject: subject.clone(), | ||
| exit_code, | ||
| }) | ||
| .await | ||
| .map_err(other_error!(e, "failed to send exit code")); | ||
| results.push(res); | ||
| }) { | ||
| error!("failed to send exit code to subscriber {}: {:?}", i, e); | ||
| dead_subscribers.push(*i); | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The current implementation for identifying dead subscribers and then removing them in a separate loop is correct, but it can be made more efficient and idiomatic. Iterating over a cloned list of subscribers and removing them directly if the send fails would avoid the second loop and the intermediate dead_subscribers vector. This approach is slightly cleaner and more direct.
if let Some(subs) = self.topic_subs.get(topic).cloned() {
for i in &subs {
if let Some(sub) = self.subscribers.get(i) {
if sub.tx.send(ExitEvent {
subject: subject.clone(),
exit_code,
}).is_err() {
error!("failed to send exit code to subscriber {}: channel closed", i);
// Unsubscribe will be called for the dead subscriber.
let _ = self.unsubscribe(*i);
}
}
}
}| let mut dead_subscribers = Vec::new(); | ||
| if let Some(subs) = self.topic_subs.get(topic) { | ||
| for i in subs { | ||
| self.subscribers.get(i).and_then(|sub| { | ||
| sub.tx | ||
| .send(ExitEvent { | ||
| subject: subject.clone(), | ||
| exit_code, | ||
| }) | ||
| .map_err(|e| warn!("failed to send {}", e)) | ||
| .ok() | ||
| }); | ||
| if let Some(sub) = self.subscribers.get(i) { | ||
| // channel::Sender::send is non-blocking when using unbounded channel. | ||
| // Sending while holding the lock prevents races with unsubscribe. | ||
| if let Err(e) = sub.tx.send(ExitEvent { | ||
| subject: subject.clone(), | ||
| exit_code, | ||
| }) { | ||
| warn!("failed to send exit event to subscriber {}: {}", i, e); | ||
| dead_subscribers.push(*i); | ||
| } | ||
| } | ||
| } | ||
| }) | ||
| } |
There was a problem hiding this comment.
The current implementation for handling dead subscribers can be made more efficient. Instead of collecting dead subscriber IDs and removing them in a second loop, you can iterate over a cloned list of subscribers and remove them directly upon a send failure. This avoids the need for the dead_subscribers vector and a second loop, making the code more direct and slightly more performant.
if let Some(subs) = self.topic_subs.get(topic).cloned() {
for i in &subs {
if let Some(sub) = self.subscribers.get(i) {
// channel::Sender::send is non-blocking when using unbounded channel.
// Sending while holding the lock prevents races with unsubscribe.
if let Err(e) = sub.tx.send(ExitEvent {
subject: subject.clone(),
exit_code,
}) {
warn!("failed to send exit event to subscriber {}: {}", i, e);
let _ = self.unsubscribe(*i);
}
}
}
}661085b to
b8a4259
Compare
This commit includes: - Upgraded rust toolchain to 1.85.0. - Implemented NoopSandboxService and updated Shim trait with proper feature gating. - Fixed workspace-wide clippy warnings and Rust 1.85 compatibility issues. - Pinned critical dependencies (home, time) in Cargo.lock for Rust 1.85. - Fixed broken test_async_output in runc crate. - Updated CI to use arduino/setup-protoc@v2, pinned cargo-deny@0.19.0, and added nightly fmt. - Refactored deny.toml to comply with 0.19.0 structure. - Suppressed rustdoc and clippy warnings for generated code. - Applied cargo fmt formatting across all crates.
b8a4259 to
4a64ac4
Compare
This PR contains two logical updates:
fix(shim): resolve intermittent deadlocks and hangs in monitor module
std::sync::Mutexto allow safeDropin any context.wait_pidto returnResult<i32>to distinguish between process exit and monitor failure.build: upgrade rust version and satisfy static checks
NoopSandboxServiceincontainerd-shimto satisfy trait requirements across the workspace.cargo fmtformatting.