-
Notifications
You must be signed in to change notification settings - Fork 6
fix: Improve state locks #93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughReworked runtime concurrency: the worker map stores Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
balius-runtime/src/lib.rs (1)
668-673: Consider releasing the read lock earlier (optional optimization).The implementation is correct. The read lock on the map is held until the end of the function. Since the map is only needed to look up the worker, you could potentially release the read lock after line 673 to improve concurrency.
However, this is a minor optimization and the current code is clear and correct. The lock is held for a brief duration (single request handling), so the impact is minimal.
Example refactor if desired:
pub async fn handle_request( &self, worker_id: &str, method: &str, params: Vec<u8>, ) -> Result<wit::Response, Error> { - let workers = self.loaded.read().await; - let mut worker = workers - .get(worker_id) - .ok_or(Error::WorkerNotFound(worker_id.to_string()))? - .lock() - .await; + let worker_mutex = { + let workers = self.loaded.read().await; + workers + .get(worker_id) + .ok_or(Error::WorkerNotFound(worker_id.to_string()))? + .clone() + }; // read lock released here + let mut worker = worker_mutex.lock().await; let channel = worker .wasm_store .data() .router .find_request_target(method)?; let evt = wit::Event::Request(params); let result = worker.dispatch_event(channel, &evt).await; self.metrics.request(worker_id, method, result.is_ok()); result }Note: This requires
Arc::clone()on the Mutex, so theWorkerMapvalues would need to beArc<Mutex<LoadedWorker>>instead of justMutex<LoadedWorker>for this optimization.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
balius-runtime/src/lib.rs(7 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: lint
- GitHub Check: build
🔇 Additional comments (6)
balius-runtime/src/lib.rs (6)
1-2: LGTM: Imports support the new concurrency model.The new imports correctly support parallel worker updates (
join_all), collection utilities (Itertools), and the read-write lock pattern (RwLock).Also applies to: 10-10
490-490: LGTM: Two-level locking improves concurrency.The change from
Arc<Mutex<WorkerMap>>toArc<RwLock<HashMap<String, Mutex<LoadedWorker>>>>is a well-established pattern that enables:
- Multiple concurrent readers of the worker map
- Fine-grained per-worker locking for mutations
- Better parallelism during chain processing
Also applies to: 496-496
515-523: LGTM: Cursor computation is correct.The read lock ensures map stability during iteration, while per-worker locks safely access individual cursors. The logic correctly computes the minimum cursor across all workers.
576-584: LGTM: Worker registration correctly uses write lock.The write lock appropriately guards the structural modification of the map, and the new worker is correctly wrapped in a
Mutexconsistent with the updatedWorkerMaptype.
617-617: LGTM: Worker removal correctly uses write lock.The write lock appropriately guards the removal operation.
638-656: Verify read lock duration during chain processing.The implementation correctly uses parallel worker updates via
join_all, which is excellent for performance. However, the read lock on the worker map is held for the entire chain processing duration (including block application and cursor updates). This will block worker registration and removal operations during chain processing.This is likely intentional to ensure atomicity—you probably want a stable worker set while applying a chain update. However, if chain processing takes significant time and you need to dynamically manage workers during processing, this could become a bottleneck.
Consider whether:
- This blocking behavior is acceptable for your use case
- If worker management during chain processing is needed, you might need to restructure the locking (e.g., snapshot worker IDs with read lock, then release before processing)
Summary by CodeRabbit
Refactor
Chores
✏️ Tip: You can customize this high-level summary in your review settings.