Skip to content

fix(shim): resolve intermittent deadlocks and satisfy Rust 1.85 checks#39

Open
novahe wants to merge 2 commits intokuasar-io:mainfrom
novahe:optimize-monitor-and-fix-ci
Open

fix(shim): resolve intermittent deadlocks and satisfy Rust 1.85 checks#39
novahe wants to merge 2 commits intokuasar-io:mainfrom
novahe:optimize-monitor-and-fix-ci

Conversation

@novahe
Copy link
Collaborator

@novahe novahe commented Mar 14, 2026

This PR contains two logical updates:

  1. fix(shim): resolve intermittent deadlocks and hangs in monitor module

    • Replaced async mutex with std::sync::Mutex to allow safe Drop in any context.
    • Switched to unbounded channels to prevent Reaper thread blocking on slow consumers.
    • Restored in-lock notification for strict FIFO ordering and unsubscription consistency.
    • Refactored wait_pid to return Result<i32> to distinguish between process exit and monitor failure.
  2. build: upgrade rust version and satisfy static checks

    • Upgraded toolchain to Rust 1.85.0.
    • Implemented NoopSandboxService in containerd-shim to satisfy trait requirements across the workspace.
    • Fixed numerous Clippy warnings (redundant guards, ptr_arg, derivable_impls, etc.).
    • Applied cargo fmt formatting.

This commit addresses several deadlock and hang scenarios in the monitor module:
- Replaced tokio::sync::Mutex with std::sync::Mutex to allow safe Drop implementation.
- Switched to unbounded channels to prevent Reaper thread blocking on subscriber backpressure.
- Restored in-lock notification to ensure strict FIFO ordering and unsubscription consistency.

Signed-off-by: novahe <heqianfly@gmail.com>
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the stability and maintainability of the shim by addressing critical deadlock issues within the monitor module and performing a comprehensive upgrade to Rust 1.85. The changes ensure more robust process monitoring, prevent potential hangs, and align the codebase with the latest Rust best practices and static analysis checks.

Highlights

  • Monitor Module Stability: Resolved intermittent deadlocks and hangs in the monitor module by replacing tokio::sync::Mutex with std::sync::Mutex for safe Drop and switching to unbounded channels to prevent consumer blocking.
  • Error Handling Improvement: Refactored wait_pid to return Result<i32> to distinguish between process exit and monitor failure, improving error handling and clarity.
  • Rust Toolchain Upgrade: Upgraded the Rust toolchain to version 1.85.0, including the addition of llvm-tools component.
  • Sandbox Service Implementation: Implemented NoopSandboxService in both asynchronous and synchronous shims to satisfy new trait requirements across the workspace.
  • Code Quality and Formatting: Addressed numerous Clippy warnings (e.g., redundant guards, ptr_arg, derivable_impls) and applied cargo fmt for consistent code style.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/ci.yml
Activity
  • No specific activity was provided in the context.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant improvements to the shim's monitor module to resolve deadlocks and hangs, along with upgrading the Rust toolchain and addressing various static analysis warnings. The switch to std::sync::Mutex and unbounded channels is a solid approach to improve robustness. The introduction of a Drop implementation for Subscription is a key enhancement for automatic resource cleanup. I've added a suggestion to improve the efficiency of removing dead subscribers. Overall, this is a high-quality update that enhances both the stability and maintainability of the codebase.

Comment on lines 121 to 133
if let Some(subs) = self.topic_subs.get(topic) {
let subscribers = subs.iter().filter_map(|x| self.subscribers.get(x));
for sub in subscribers {
let res = sub
.tx
.send(ExitEvent {
for i in subs {
if let Some(sub) = self.subscribers.get(i) {
if let Err(e) = sub.tx.send(ExitEvent {
subject: subject.clone(),
exit_code,
})
.await
.map_err(other_error!(e, "failed to send exit code"));
results.push(res);
}) {
error!("failed to send exit code to subscriber {}: {:?}", i, e);
dead_subscribers.push(*i);
}
}
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation for identifying dead subscribers and then removing them in a separate loop is correct, but it can be made more efficient and idiomatic. Iterating over a cloned list of subscribers and removing them directly if the send fails would avoid the second loop and the intermediate dead_subscribers vector. This approach is slightly cleaner and more direct.

        if let Some(subs) = self.topic_subs.get(topic).cloned() {
            for i in &subs {
                if let Some(sub) = self.subscribers.get(i) {
                    if sub.tx.send(ExitEvent {
                        subject: subject.clone(),
                        exit_code,
                    }).is_err() {
                        error!("failed to send exit code to subscriber {}: channel closed", i);
                        // Unsubscribe will be called for the dead subscriber.
                        let _ = self.unsubscribe(*i);
                    }
                }
            }
        }

Comment on lines +106 to +121
let mut dead_subscribers = Vec::new();
if let Some(subs) = self.topic_subs.get(topic) {
for i in subs {
self.subscribers.get(i).and_then(|sub| {
sub.tx
.send(ExitEvent {
subject: subject.clone(),
exit_code,
})
.map_err(|e| warn!("failed to send {}", e))
.ok()
});
if let Some(sub) = self.subscribers.get(i) {
// channel::Sender::send is non-blocking when using unbounded channel.
// Sending while holding the lock prevents races with unsubscribe.
if let Err(e) = sub.tx.send(ExitEvent {
subject: subject.clone(),
exit_code,
}) {
warn!("failed to send exit event to subscriber {}: {}", i, e);
dead_subscribers.push(*i);
}
}
}
})
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation for handling dead subscribers can be made more efficient. Instead of collecting dead subscriber IDs and removing them in a second loop, you can iterate over a cloned list of subscribers and remove them directly upon a send failure. This avoids the need for the dead_subscribers vector and a second loop, making the code more direct and slightly more performant.

        if let Some(subs) = self.topic_subs.get(topic).cloned() {
            for i in &subs {
                if let Some(sub) = self.subscribers.get(i) {
                    // channel::Sender::send is non-blocking when using unbounded channel.
                    // Sending while holding the lock prevents races with unsubscribe.
                    if let Err(e) = sub.tx.send(ExitEvent {
                        subject: subject.clone(),
                        exit_code,
                    }) {
                        warn!("failed to send exit event to subscriber {}: {}", i, e);
                        let _ = self.unsubscribe(*i);
                    }
                }
            }
        }

@novahe novahe force-pushed the optimize-monitor-and-fix-ci branch 9 times, most recently from 661085b to b8a4259 Compare March 14, 2026 09:41
This commit includes:
- Upgraded rust toolchain to 1.85.0.
- Implemented NoopSandboxService and updated Shim trait with proper feature gating.
- Fixed workspace-wide clippy warnings and Rust 1.85 compatibility issues.
- Pinned critical dependencies (home, time) in Cargo.lock for Rust 1.85.
- Fixed broken test_async_output in runc crate.
- Updated CI to use arduino/setup-protoc@v2, pinned cargo-deny@0.19.0, and added nightly fmt.
- Refactored deny.toml to comply with 0.19.0 structure.
- Suppressed rustdoc and clippy warnings for generated code.
- Applied cargo fmt formatting across all crates.
@novahe novahe force-pushed the optimize-monitor-and-fix-ci branch from b8a4259 to 4a64ac4 Compare March 14, 2026 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant