-
Notifications
You must be signed in to change notification settings - Fork 7
feat: add circuit breaker for challenge container connections #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add circuit breaker for challenge container connections #25
Conversation
📝 WalkthroughWalkthroughA new circuit breaker module is introduced to track per-challenge connection health with automatic state transitions between Closed, Open, and HalfOpen states. The evaluator integrates the circuit breaker to pre-check availability and record success or failure on each request. Circuit breaker types are re-exported from the crate root. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Evaluator
participant CircuitBreakerManager
participant CircuitData
Client->>Evaluator: evaluate_generic(challenge_id)
Evaluator->>CircuitBreakerManager: check(challenge_id)
alt Circuit Open
CircuitBreakerManager-->>Evaluator: CircuitOpenError
Evaluator-->>Client: Error (Circuit Open)
else Circuit Closed/HalfOpen
CircuitBreakerManager-->>Evaluator: Ok(())
Evaluator->>CircuitBreakerManager: (Proceed with evaluation)
Evaluator->>CircuitData: proxy_request()
alt Request Success
CircuitData-->>Evaluator: Ok(response)
Evaluator->>CircuitBreakerManager: record_success(challenge_id)
CircuitBreakerManager->>CircuitData: Update state (transitions if needed)
Evaluator-->>Client: Success response
else Request Failure
CircuitData-->>Evaluator: Error
Evaluator->>CircuitBreakerManager: record_failure(challenge_id)
CircuitBreakerManager->>CircuitData: Increment failure_count (may Open)
Evaluator-->>Client: Error response
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
crates/challenge-orchestrator/src/evaluator.rs (1)
37-74: Consider extracting HTTP client creation to reduce duplication.The
reqwest::Clientcreation logic (timeout,.expect()) is duplicated betweennew()(lines 40-43) andwith_circuit_breaker_config()(lines 64-67). A small helper or havingnew()delegate towith_circuit_breaker_config()would eliminate duplication.♻️ Suggested refactor
+ fn build_client() -> reqwest::Client { + reqwest::Client::builder() + .timeout(Duration::from_secs(3600)) + .build() + .expect("Failed to create HTTP client") + } + pub fn new(challenges: Arc<RwLock<HashMap<ChallengeId, ChallengeInstance>>>) -> Self { - let client = reqwest::Client::builder() - .timeout(Duration::from_secs(3600)) - .build() - .expect("Failed to create HTTP client"); - Self { challenges, - client, + client: Self::build_client(), circuit_breaker: None, } } // ... with_circuit_breaker unchanged ... pub fn with_circuit_breaker_config( challenges: Arc<RwLock<HashMap<ChallengeId, ChallengeInstance>>>, config: CircuitBreakerConfig, ) -> Self { - let client = reqwest::Client::builder() - .timeout(Duration::from_secs(3600)) - .build() - .expect("Failed to create HTTP client"); - Self { challenges, - client, + client: Self::build_client(), circuit_breaker: Some(Arc::new(CircuitBreakerManager::with_config(config))), } }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
crates/challenge-orchestrator/src/circuit_breaker.rscrates/challenge-orchestrator/src/evaluator.rscrates/challenge-orchestrator/src/lib.rs
🧰 Additional context used
🧬 Code graph analysis (3)
crates/challenge-orchestrator/src/lib.rs (1)
crates/challenge-orchestrator/src/evaluator.rs (1)
circuit_breaker(77-79)
crates/challenge-orchestrator/src/evaluator.rs (1)
crates/challenge-orchestrator/src/circuit_breaker.rs (5)
new(68-76)new(87-89)default(37-43)default(272-274)with_config(92-97)
crates/challenge-orchestrator/src/circuit_breaker.rs (1)
crates/challenge-orchestrator/src/evaluator.rs (1)
new(39-50)
🔇 Additional comments (13)
crates/challenge-orchestrator/src/lib.rs (1)
29-29: LGTM!The new
circuit_breakermodule is correctly declared and its public types are properly re-exported at the crate root, enabling external consumers to accessCircuitBreakerConfig,CircuitBreakerManager,CircuitOpenError,CircuitState, andCircuitStatsdirectly from the crate.Also applies to: 40-40
crates/challenge-orchestrator/src/evaluator.rs (5)
31-34: LGTM!The optional
circuit_breakerfield withOption<Arc<CircuitBreakerManager>>correctly enables shared ownership across async contexts while maintaining backward compatibility with the existingnew()constructor.
129-142: LGTM!The circuit breaker integration correctly:
- Fast-fails requests when the circuit is open
- Records success only for 2xx responses
- Records failure for 4xx/5xx responses and network errors
The pattern of borrowing
&resultto record outcomes before consuming it is correct.
178-186: LGTM!The circuit breaker integration in
proxy_requestmirrorsevaluate_genericcorrectly, with proper pre-check and outcome recording.Also applies to: 216-232
250-308: Verify:get_infoandcheck_healthare intentionally excluded from circuit breaker protection.Unlike
evaluate_genericandproxy_request, these methods don't check or record with the circuit breaker. This may be intentional—health checks could be probing recovery, and info requests are metadata—but the inconsistency could surprise callers expecting uniform protection.Consider documenting this design choice in the method doc comments if intentional, or integrating circuit breaker logic if these should also be protected.
370-376: LGTM!The new
CircuitOpenerror variant properly surfaces both the affectedchallenge_idand the optional retry duration, giving callers actionable information to handle the fast-fail scenario.crates/challenge-orchestrator/src/circuit_breaker.rs (7)
25-77: LGTM!The configuration and state definitions are well-structured:
- Sensible defaults (5 failures, 30s timeout, 2 successes for recovery)
- Clear state enum with proper documentation
- Internal
CircuitDatacorrectly encapsulates tracking fields
103-137: LGTM!The
checkmethod correctly implements the state machine:
- Closed: allows requests
- Open: transitions to HalfOpen after
reset_timeoutelapsed, otherwise fast-fails with remaining retry time- HalfOpen: allows test requests
Using a write lock is appropriate since the method may mutate state (Open→HalfOpen transition).
139-177: LGTM!The
record_successmethod correctly handles all states:
- Closed: resets failure count (breaks failure chain)
- HalfOpen: accumulates successes toward recovery threshold
- Open: defensive reset (handles unexpected but possible scenario)
179-222: LGTM with one consideration.The
record_failureimplementation is correct:
- Closed: opens circuit after threshold failures
- HalfOpen: immediately reopens on any failure (strict recovery)
- Open: updates
last_failure_timeto extend the cooldownNote on concurrent HalfOpen requests: In HalfOpen state, multiple concurrent
check()calls can all pass before anyrecord_success/record_failureupdates the state. If the first returning request fails, it reopens the circuit even if others would have succeeded. This is a known trade-off in simple circuit breaker designs and is acceptable for challenge container use cases where strict recovery is preferred.
224-268: LGTM!Utility methods are well-implemented:
get_statecorrectly defaults toClosedfor unknown challengesresetfully clears state for manual intervention scenariosremoveprovides cleanup when challenges are removedget_statsenables observability for monitoring dashboards
277-309: LGTM!
CircuitOpenErrorandCircuitStatsare well-designed:
- Error provides actionable retry information
- Display formatting is user-friendly
- Stats capture all essential debugging fields for observability
311-487: LGTM!Comprehensive test coverage for the circuit breaker state machine:
- All state transitions (Closed→Open→HalfOpen→Closed/Open)
- Counter behavior (failure count reset on success)
- Manual reset and removal
- Error formatting
The tests use
std::thread::sleepfor timing which is acceptable for this use case.
|
@echobt Could you please have a look at the PR and let me know your feedback? |
Circuit Breaker for Challenge Container Connections
Problem
The evaluator makes HTTP requests to challenge containers without circuit breaker protection. If a container becomes unhealthy, every request waits for timeout (up to 3600s default), wasting resources.
Solution
Implement circuit breaker pattern with three states:
Changes
circuit_breaker.rs- New module withCircuitBreakerManagerevaluator.rs- Integrate circuit breaker intoevaluate_generic()andproxy_request()lib.rs- Export circuit breaker typesConfiguration (Defaults)
failure_thresholdreset_timeoutsuccess_thresholdUsage
Testing
Backward Compatible
Existing
new()constructor unchanged - circuit breaker is opt-in.Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.