feat: add circuit breaker for challenge container connections #25

MkDev11 · 2026-01-09T12:47:57Z

Circuit Breaker for Challenge Container Connections

Problem

The evaluator makes HTTP requests to challenge containers without circuit breaker protection. If a container becomes unhealthy, every request waits for timeout (up to 3600s default), wasting resources.

Solution

Implement circuit breaker pattern with three states:

Closed: Normal operation, requests pass through
Open: Fast-fail all requests (container known unhealthy)
HalfOpen: Allow test requests to check recovery

Changes

circuit_breaker.rs - New module with CircuitBreakerManager
evaluator.rs - Integrate circuit breaker into evaluate_generic() and proxy_request()
lib.rs - Export circuit breaker types

Configuration (Defaults)

Setting	Value	Description
`failure_threshold`	5	Consecutive failures to open circuit
`reset_timeout`	30s	Time before testing recovery
`success_threshold`	2	Successes in HalfOpen to close

Usage

// Without circuit breaker (existing behavior)
let evaluator = ChallengeEvaluator::new(challenges);

// With circuit breaker (new)
let evaluator = ChallengeEvaluator::with_circuit_breaker(challenges);

// With custom config
let config = CircuitBreakerConfig {
    failure_threshold: 3,
    reset_timeout: Duration::from_secs(60),
    success_threshold: 1,
};
let evaluator = ChallengeEvaluator::with_circuit_breaker_config(challenges, config);

Testing

Unit tests (12 new tests for circuit breaker)
All existing tests pass (85 total)
Clippy clean

Backward Compatible

Existing new() constructor unchanged - circuit breaker is opt-in.

Summary by CodeRabbit

Release Notes

New Features
- Circuit breaker system added to improve resilience for challenge connections with automatic failure detection and recovery
- Configurable thresholds for triggering circuit opens and timeout-based recovery mechanisms
- Enhanced error reporting indicating when challenges are unavailable and estimated retry timing

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-09T12:48:07Z

📝 Walkthrough

Walkthrough

A new circuit breaker module is introduced to track per-challenge connection health with automatic state transitions between Closed, Open, and HalfOpen states. The evaluator integrates the circuit breaker to pre-check availability and record success or failure on each request. Circuit breaker types are re-exported from the crate root.

Changes

Cohort / File(s)	Summary
New Circuit Breaker Module `crates/challenge-orchestrator/src/circuit_breaker.rs`	Introduces complete circuit breaker implementation with `CircuitBreakerConfig`, `CircuitState` enum (Closed/Open/HalfOpen), `CircuitBreakerManager` for per-challenge tracking, `CircuitOpenError`, and `CircuitStats`. Implements state machine with automatic transitions based on failure/success thresholds and reset timeouts. Includes comprehensive tests.
Evaluator Integration `crates/challenge-orchestrator/src/evaluator.rs`	Adds optional `circuit_breaker` field to `ChallengeEvaluator` and constructors (`with_circuit_breaker`, `with_circuit_breaker_config`). Integrates circuit breaker checks before requests and records outcomes. Adds `CircuitOpen` variant to `EvaluatorError` enum.
Module Exports `crates/challenge-orchestrator/src/lib.rs`	Declares new `circuit_breaker` module and re-exports key types (`CircuitBreakerConfig`, `CircuitBreakerManager`, `CircuitOpenError`, `CircuitState`, `CircuitStats`) at crate root.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Evaluator
    participant CircuitBreakerManager
    participant CircuitData

    Client->>Evaluator: evaluate_generic(challenge_id)
    Evaluator->>CircuitBreakerManager: check(challenge_id)
    alt Circuit Open
        CircuitBreakerManager-->>Evaluator: CircuitOpenError
        Evaluator-->>Client: Error (Circuit Open)
    else Circuit Closed/HalfOpen
        CircuitBreakerManager-->>Evaluator: Ok(())
        Evaluator->>CircuitBreakerManager: (Proceed with evaluation)
        Evaluator->>CircuitData: proxy_request()
        alt Request Success
            CircuitData-->>Evaluator: Ok(response)
            Evaluator->>CircuitBreakerManager: record_success(challenge_id)
            CircuitBreakerManager->>CircuitData: Update state (transitions if needed)
            Evaluator-->>Client: Success response
        else Request Failure
            CircuitData-->>Evaluator: Error
            Evaluator->>CircuitBreakerManager: record_failure(challenge_id)
            CircuitBreakerManager->>CircuitData: Increment failure_count (may Open)
            Evaluator-->>Client: Error response
        end
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A circuit breaker hops into place,
Guarding challenges with steady grace,
Open, closed, or halfway through,
Protecting connections, tried and true!
Per-challenge health now monitored with care, 🔌✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: introducing a circuit breaker pattern for challenge container connections to improve resilience.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

crates/challenge-orchestrator/src/evaluator.rs (1)

37-74: Consider extracting HTTP client creation to reduce duplication.

The reqwest::Client creation logic (timeout, .expect()) is duplicated between new() (lines 40-43) and with_circuit_breaker_config() (lines 64-67). A small helper or having new() delegate to with_circuit_breaker_config() would eliminate duplication.

♻️ Suggested refactor

+    fn build_client() -> reqwest::Client {
+        reqwest::Client::builder()
+            .timeout(Duration::from_secs(3600))
+            .build()
+            .expect("Failed to create HTTP client")
+    }
+
     pub fn new(challenges: Arc<RwLock<HashMap<ChallengeId, ChallengeInstance>>>) -> Self {
-        let client = reqwest::Client::builder()
-            .timeout(Duration::from_secs(3600))
-            .build()
-            .expect("Failed to create HTTP client");
-
         Self {
             challenges,
-            client,
+            client: Self::build_client(),
             circuit_breaker: None,
         }
     }
     
     // ... with_circuit_breaker unchanged ...
     
     pub fn with_circuit_breaker_config(
         challenges: Arc<RwLock<HashMap<ChallengeId, ChallengeInstance>>>,
         config: CircuitBreakerConfig,
     ) -> Self {
-        let client = reqwest::Client::builder()
-            .timeout(Duration::from_secs(3600))
-            .build()
-            .expect("Failed to create HTTP client");
-
         Self {
             challenges,
-            client,
+            client: Self::build_client(),
             circuit_breaker: Some(Arc::new(CircuitBreakerManager::with_config(config))),
         }
     }

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7678293 and 0bb82f7.

📒 Files selected for processing (3)

crates/challenge-orchestrator/src/circuit_breaker.rs
crates/challenge-orchestrator/src/evaluator.rs
crates/challenge-orchestrator/src/lib.rs

🧰 Additional context used

🧬 Code graph analysis (3)

crates/challenge-orchestrator/src/lib.rs (1)

crates/challenge-orchestrator/src/evaluator.rs (1)

circuit_breaker (77-79)

crates/challenge-orchestrator/src/evaluator.rs (1)

crates/challenge-orchestrator/src/circuit_breaker.rs (5)

new (68-76)

new (87-89)

default (37-43)

default (272-274)

with_config (92-97)

crates/challenge-orchestrator/src/circuit_breaker.rs (1)

crates/challenge-orchestrator/src/evaluator.rs (1)

new (39-50)

🔇 Additional comments (13)

crates/challenge-orchestrator/src/lib.rs (1)

29-29: LGTM!

The new circuit_breaker module is correctly declared and its public types are properly re-exported at the crate root, enabling external consumers to access CircuitBreakerConfig, CircuitBreakerManager, CircuitOpenError, CircuitState, and CircuitStats directly from the crate.

Also applies to: 40-40

crates/challenge-orchestrator/src/evaluator.rs (5)

31-34: LGTM!

The optional circuit_breaker field with Option<Arc<CircuitBreakerManager>> correctly enables shared ownership across async contexts while maintaining backward compatibility with the existing new() constructor.

129-142: LGTM!

The circuit breaker integration correctly:

Fast-fails requests when the circuit is open

Records success only for 2xx responses

Records failure for 4xx/5xx responses and network errors

The pattern of borrowing &result to record outcomes before consuming it is correct.

178-186: LGTM!

The circuit breaker integration in proxy_request mirrors evaluate_generic correctly, with proper pre-check and outcome recording.

Also applies to: 216-232

250-308: Verify: get_info and check_health are intentionally excluded from circuit breaker protection.

Unlike evaluate_generic and proxy_request, these methods don't check or record with the circuit breaker. This may be intentional—health checks could be probing recovery, and info requests are metadata—but the inconsistency could surprise callers expecting uniform protection.

Consider documenting this design choice in the method doc comments if intentional, or integrating circuit breaker logic if these should also be protected.

370-376: LGTM!

The new CircuitOpen error variant properly surfaces both the affected challenge_id and the optional retry duration, giving callers actionable information to handle the fast-fail scenario.

crates/challenge-orchestrator/src/circuit_breaker.rs (7)

25-77: LGTM!

The configuration and state definitions are well-structured:

Sensible defaults (5 failures, 30s timeout, 2 successes for recovery)

Clear state enum with proper documentation

Internal CircuitData correctly encapsulates tracking fields

103-137: LGTM!

The check method correctly implements the state machine:

Closed: allows requests

Open: transitions to HalfOpen after reset_timeout elapsed, otherwise fast-fails with remaining retry time

HalfOpen: allows test requests

Using a write lock is appropriate since the method may mutate state (Open→HalfOpen transition).

139-177: LGTM!

The record_success method correctly handles all states:

Closed: resets failure count (breaks failure chain)

HalfOpen: accumulates successes toward recovery threshold

Open: defensive reset (handles unexpected but possible scenario)

179-222: LGTM with one consideration.

The record_failure implementation is correct:

Closed: opens circuit after threshold failures

HalfOpen: immediately reopens on any failure (strict recovery)

Open: updates last_failure_time to extend the cooldown

Note on concurrent HalfOpen requests: In HalfOpen state, multiple concurrent check() calls can all pass before any record_success/record_failure updates the state. If the first returning request fails, it reopens the circuit even if others would have succeeded. This is a known trade-off in simple circuit breaker designs and is acceptable for challenge container use cases where strict recovery is preferred.

224-268: LGTM!

Utility methods are well-implemented:

get_state correctly defaults to Closed for unknown challenges

reset fully clears state for manual intervention scenarios

remove provides cleanup when challenges are removed

get_stats enables observability for monitoring dashboards

277-309: LGTM!

CircuitOpenError and CircuitStats are well-designed:

Error provides actionable retry information

Display formatting is user-friendly

Stats capture all essential debugging fields for observability

311-487: LGTM!

Comprehensive test coverage for the circuit breaker state machine:

All state transitions (Closed→Open→HalfOpen→Closed/Open)

Counter behavior (failure count reset on success)

Manual reset and removal

Error formatting

The tests use std::thread::sleep for timing which is acceptable for this use case.

MkDev11 · 2026-01-20T14:28:04Z

@echobt Could you please have a look at the PR and let me know your feedback?

feat: add circuit breaker for challenge container connections

0bb82f7

coderabbitai bot reviewed Jan 9, 2026

View reviewed changes

MkDev11 closed this Jan 20, 2026

MkDev11 reopened this Jan 20, 2026

MkDev11 closed this Jan 21, 2026

MkDev11 reopened this Jan 21, 2026

MkDev11 closed this Jan 21, 2026

MkDev11 reopened this Jan 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add circuit breaker for challenge container connections #25

feat: add circuit breaker for challenge container connections #25

Uh oh!

MkDev11 commented Jan 9, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Jan 9, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

MkDev11 commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: add circuit breaker for challenge container connections #25

Are you sure you want to change the base?

feat: add circuit breaker for challenge container connections #25

Uh oh!

Conversation

MkDev11 commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Circuit Breaker for Challenge Container Connections

Problem

Solution

Changes

Configuration (Defaults)

Usage

Testing

Backward Compatible

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

MkDev11 commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MkDev11 commented Jan 9, 2026 •

edited

Loading

coderabbitai bot commented Jan 9, 2026 •

edited

Loading