Skip to content

Conversation

@MkDev11
Copy link

@MkDev11 MkDev11 commented Jan 9, 2026

Circuit Breaker for Challenge Container Connections

Problem

The evaluator makes HTTP requests to challenge containers without circuit breaker protection. If a container becomes unhealthy, every request waits for timeout (up to 3600s default), wasting resources.

Solution

Implement circuit breaker pattern with three states:

  • Closed: Normal operation, requests pass through
  • Open: Fast-fail all requests (container known unhealthy)
  • HalfOpen: Allow test requests to check recovery

Changes

  • circuit_breaker.rs - New module with CircuitBreakerManager
  • evaluator.rs - Integrate circuit breaker into evaluate_generic() and proxy_request()
  • lib.rs - Export circuit breaker types

Configuration (Defaults)

Setting Value Description
failure_threshold 5 Consecutive failures to open circuit
reset_timeout 30s Time before testing recovery
success_threshold 2 Successes in HalfOpen to close

Usage

// Without circuit breaker (existing behavior)
let evaluator = ChallengeEvaluator::new(challenges);

// With circuit breaker (new)
let evaluator = ChallengeEvaluator::with_circuit_breaker(challenges);

// With custom config
let config = CircuitBreakerConfig {
    failure_threshold: 3,
    reset_timeout: Duration::from_secs(60),
    success_threshold: 1,
};
let evaluator = ChallengeEvaluator::with_circuit_breaker_config(challenges, config);

Testing

  • Unit tests (12 new tests for circuit breaker)
  • All existing tests pass (85 total)
  • Clippy clean

Backward Compatible

Existing new() constructor unchanged - circuit breaker is opt-in.

Summary by CodeRabbit

Release Notes

  • New Features
    • Circuit breaker system added to improve resilience for challenge connections with automatic failure detection and recovery
    • Configurable thresholds for triggering circuit opens and timeout-based recovery mechanisms
    • Enhanced error reporting indicating when challenges are unavailable and estimated retry timing

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 9, 2026

📝 Walkthrough

Walkthrough

A new circuit breaker module is introduced to track per-challenge connection health with automatic state transitions between Closed, Open, and HalfOpen states. The evaluator integrates the circuit breaker to pre-check availability and record success or failure on each request. Circuit breaker types are re-exported from the crate root.

Changes

Cohort / File(s) Summary
New Circuit Breaker Module
crates/challenge-orchestrator/src/circuit_breaker.rs
Introduces complete circuit breaker implementation with CircuitBreakerConfig, CircuitState enum (Closed/Open/HalfOpen), CircuitBreakerManager for per-challenge tracking, CircuitOpenError, and CircuitStats. Implements state machine with automatic transitions based on failure/success thresholds and reset timeouts. Includes comprehensive tests.
Evaluator Integration
crates/challenge-orchestrator/src/evaluator.rs
Adds optional circuit_breaker field to ChallengeEvaluator and constructors (with_circuit_breaker, with_circuit_breaker_config). Integrates circuit breaker checks before requests and records outcomes. Adds CircuitOpen variant to EvaluatorError enum.
Module Exports
crates/challenge-orchestrator/src/lib.rs
Declares new circuit_breaker module and re-exports key types (CircuitBreakerConfig, CircuitBreakerManager, CircuitOpenError, CircuitState, CircuitStats) at crate root.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Evaluator
    participant CircuitBreakerManager
    participant CircuitData

    Client->>Evaluator: evaluate_generic(challenge_id)
    Evaluator->>CircuitBreakerManager: check(challenge_id)
    alt Circuit Open
        CircuitBreakerManager-->>Evaluator: CircuitOpenError
        Evaluator-->>Client: Error (Circuit Open)
    else Circuit Closed/HalfOpen
        CircuitBreakerManager-->>Evaluator: Ok(())
        Evaluator->>CircuitBreakerManager: (Proceed with evaluation)
        Evaluator->>CircuitData: proxy_request()
        alt Request Success
            CircuitData-->>Evaluator: Ok(response)
            Evaluator->>CircuitBreakerManager: record_success(challenge_id)
            CircuitBreakerManager->>CircuitData: Update state (transitions if needed)
            Evaluator-->>Client: Success response
        else Request Failure
            CircuitData-->>Evaluator: Error
            Evaluator->>CircuitBreakerManager: record_failure(challenge_id)
            CircuitBreakerManager->>CircuitData: Increment failure_count (may Open)
            Evaluator-->>Client: Error response
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A circuit breaker hops into place,
Guarding challenges with steady grace,
Open, closed, or halfway through,
Protecting connections, tried and true!
Per-challenge health now monitored with care, 🔌✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: introducing a circuit breaker pattern for challenge container connections to improve resilience.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
crates/challenge-orchestrator/src/evaluator.rs (1)

37-74: Consider extracting HTTP client creation to reduce duplication.

The reqwest::Client creation logic (timeout, .expect()) is duplicated between new() (lines 40-43) and with_circuit_breaker_config() (lines 64-67). A small helper or having new() delegate to with_circuit_breaker_config() would eliminate duplication.

♻️ Suggested refactor
+    fn build_client() -> reqwest::Client {
+        reqwest::Client::builder()
+            .timeout(Duration::from_secs(3600))
+            .build()
+            .expect("Failed to create HTTP client")
+    }
+
     pub fn new(challenges: Arc<RwLock<HashMap<ChallengeId, ChallengeInstance>>>) -> Self {
-        let client = reqwest::Client::builder()
-            .timeout(Duration::from_secs(3600))
-            .build()
-            .expect("Failed to create HTTP client");
-
         Self {
             challenges,
-            client,
+            client: Self::build_client(),
             circuit_breaker: None,
         }
     }
     
     // ... with_circuit_breaker unchanged ...
     
     pub fn with_circuit_breaker_config(
         challenges: Arc<RwLock<HashMap<ChallengeId, ChallengeInstance>>>,
         config: CircuitBreakerConfig,
     ) -> Self {
-        let client = reqwest::Client::builder()
-            .timeout(Duration::from_secs(3600))
-            .build()
-            .expect("Failed to create HTTP client");
-
         Self {
             challenges,
-            client,
+            client: Self::build_client(),
             circuit_breaker: Some(Arc::new(CircuitBreakerManager::with_config(config))),
         }
     }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7678293 and 0bb82f7.

📒 Files selected for processing (3)
  • crates/challenge-orchestrator/src/circuit_breaker.rs
  • crates/challenge-orchestrator/src/evaluator.rs
  • crates/challenge-orchestrator/src/lib.rs
🧰 Additional context used
🧬 Code graph analysis (3)
crates/challenge-orchestrator/src/lib.rs (1)
crates/challenge-orchestrator/src/evaluator.rs (1)
  • circuit_breaker (77-79)
crates/challenge-orchestrator/src/evaluator.rs (1)
crates/challenge-orchestrator/src/circuit_breaker.rs (5)
  • new (68-76)
  • new (87-89)
  • default (37-43)
  • default (272-274)
  • with_config (92-97)
crates/challenge-orchestrator/src/circuit_breaker.rs (1)
crates/challenge-orchestrator/src/evaluator.rs (1)
  • new (39-50)
🔇 Additional comments (13)
crates/challenge-orchestrator/src/lib.rs (1)

29-29: LGTM!

The new circuit_breaker module is correctly declared and its public types are properly re-exported at the crate root, enabling external consumers to access CircuitBreakerConfig, CircuitBreakerManager, CircuitOpenError, CircuitState, and CircuitStats directly from the crate.

Also applies to: 40-40

crates/challenge-orchestrator/src/evaluator.rs (5)

31-34: LGTM!

The optional circuit_breaker field with Option<Arc<CircuitBreakerManager>> correctly enables shared ownership across async contexts while maintaining backward compatibility with the existing new() constructor.


129-142: LGTM!

The circuit breaker integration correctly:

  1. Fast-fails requests when the circuit is open
  2. Records success only for 2xx responses
  3. Records failure for 4xx/5xx responses and network errors

The pattern of borrowing &result to record outcomes before consuming it is correct.


178-186: LGTM!

The circuit breaker integration in proxy_request mirrors evaluate_generic correctly, with proper pre-check and outcome recording.

Also applies to: 216-232


250-308: Verify: get_info and check_health are intentionally excluded from circuit breaker protection.

Unlike evaluate_generic and proxy_request, these methods don't check or record with the circuit breaker. This may be intentional—health checks could be probing recovery, and info requests are metadata—but the inconsistency could surprise callers expecting uniform protection.

Consider documenting this design choice in the method doc comments if intentional, or integrating circuit breaker logic if these should also be protected.


370-376: LGTM!

The new CircuitOpen error variant properly surfaces both the affected challenge_id and the optional retry duration, giving callers actionable information to handle the fast-fail scenario.

crates/challenge-orchestrator/src/circuit_breaker.rs (7)

25-77: LGTM!

The configuration and state definitions are well-structured:

  • Sensible defaults (5 failures, 30s timeout, 2 successes for recovery)
  • Clear state enum with proper documentation
  • Internal CircuitData correctly encapsulates tracking fields

103-137: LGTM!

The check method correctly implements the state machine:

  • Closed: allows requests
  • Open: transitions to HalfOpen after reset_timeout elapsed, otherwise fast-fails with remaining retry time
  • HalfOpen: allows test requests

Using a write lock is appropriate since the method may mutate state (Open→HalfOpen transition).


139-177: LGTM!

The record_success method correctly handles all states:

  • Closed: resets failure count (breaks failure chain)
  • HalfOpen: accumulates successes toward recovery threshold
  • Open: defensive reset (handles unexpected but possible scenario)

179-222: LGTM with one consideration.

The record_failure implementation is correct:

  • Closed: opens circuit after threshold failures
  • HalfOpen: immediately reopens on any failure (strict recovery)
  • Open: updates last_failure_time to extend the cooldown

Note on concurrent HalfOpen requests: In HalfOpen state, multiple concurrent check() calls can all pass before any record_success/record_failure updates the state. If the first returning request fails, it reopens the circuit even if others would have succeeded. This is a known trade-off in simple circuit breaker designs and is acceptable for challenge container use cases where strict recovery is preferred.


224-268: LGTM!

Utility methods are well-implemented:

  • get_state correctly defaults to Closed for unknown challenges
  • reset fully clears state for manual intervention scenarios
  • remove provides cleanup when challenges are removed
  • get_stats enables observability for monitoring dashboards

277-309: LGTM!

CircuitOpenError and CircuitStats are well-designed:

  • Error provides actionable retry information
  • Display formatting is user-friendly
  • Stats capture all essential debugging fields for observability

311-487: LGTM!

Comprehensive test coverage for the circuit breaker state machine:

  • All state transitions (Closed→Open→HalfOpen→Closed/Open)
  • Counter behavior (failure count reset on success)
  • Manual reset and removal
  • Error formatting

The tests use std::thread::sleep for timing which is acceptable for this use case.

@MkDev11 MkDev11 closed this Jan 20, 2026
@MkDev11 MkDev11 reopened this Jan 20, 2026
@MkDev11
Copy link
Author

MkDev11 commented Jan 20, 2026

@echobt Could you please have a look at the PR and let me know your feedback?

@MkDev11 MkDev11 closed this Jan 21, 2026
@MkDev11 MkDev11 reopened this Jan 21, 2026
@MkDev11 MkDev11 closed this Jan 21, 2026
@MkDev11 MkDev11 reopened this Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant