Review/all modules#41
Conversation
- ext/redis: RedisClient::new/get_conn now return Result instead of panicking on pool exhaustion or URL parse failure. Update all callers (plugins/ext/redis/*, plugins/limit.rs, shell/server.rs) to propagate errors. - kernel/static_file_service: replace Unix-only MetadataExt::size() with cross-platform Metadata::len() so the kernel compiles on Windows. - shell/server: stop logging raw redis URL (may contain credentials); propagate client creation errors instead of eating them via From<&str>::from. Review skill adapted from BIOS to spacegate conventions.
- admin-server/auth: harden cookie (Secure+SameSite=Strict), remove SystemTime/HeaderValue unwraps - kernel/match_hostname: remove UB pointer cast in get_mut_by_iter, use safe recursive probe - kernel/reload: fix 'posisoned' typo, recover from poisoned RwLock instead of panic - kernel/lib: add explicit lifetime placeholders on iterator returns - shell/server: recover from poisoned Mutex instead of panic - config/k8s/listen: replace 4 silent try_next().unwrap_or_default() with explicit error logging+retry - plugin: delete dead retry/decompression/status/status_prev modules (depended on non-existent tardis crate), drop matching features from plugin & shell Cargo.toml - add REVIEW_REPORT.md at repo root summarizing all findings and fixes
There was a problem hiding this comment.
Pull request overview
This PR applies a workspace-wide “review fixes” sweep across Spacegate modules: hardening error handling (especially Redis pool usage), improving logging consistency, addressing some portability/lifetime issues, and removing dead/unreachable plugin modules. It also adds review documentation under the repo.
Changes:
- Replace panic-prone Redis pool usage with
Result-based APIs and propagate/handle connection acquisition failures across plugins and shell startup. - Improve operational safety and observability (remove credential-bearing logs, replace
println!/eprintln!withtracing, handle watcher stream errors explicitly). - Remove unused/dead plugin modules/features (retry/status/decompression) and add review reports/guidelines.
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
crates/shell/src/server.rs |
Recover from poisoned locks; avoid logging Redis URLs; initialize Redis client with error propagation. |
crates/shell/Cargo.toml |
Remove obsolete plugin feature flags. |
crates/plugin/Cargo.toml |
Remove unused features (retry, status, decompression) and optional hyper-util dependency. |
crates/plugin/src/plugins.rs |
Remove stale commented module declarations. |
crates/plugin/src/plugins/limit.rs |
Propagate Redis connection acquisition failures instead of panicking. |
crates/plugin/src/ext/redis/plugins/redis_count.rs |
Propagate get_conn() errors; degrade gracefully on error-path Redis updates. |
crates/plugin/src/ext/redis/plugins/redis_limit.rs |
Propagate get_conn() errors. |
crates/plugin/src/ext/redis/plugins/redis_dynamic_route.rs |
Propagate get_conn() errors. |
crates/plugin/src/ext/redis/plugins/redis_time_range.rs |
Propagate get_conn() errors; update tests to new client creation/conn API. |
crates/plugin/src/plugins/retry.rs |
Deleted dead/unreachable plugin implementation. |
crates/plugin/src/plugins/decompression.rs |
Deleted dead/unreachable plugin implementation. |
crates/plugin/src/plugins/status_prev.rs |
Deleted dead/unreachable legacy status plugin implementation. |
crates/plugin/src/plugins/status.rs |
Deleted dead/unreachable status plugin implementation. |
crates/plugin/src/plugins/status/* |
Deleted dead/unreachable status implementation files (server/sliding window/html). |
crates/extension/redis/src/lib.rs |
Change Redis client/pool APIs to return Result (PoolError); tighten repo add() signature. |
crates/kernel/src/backend_service/static_file_service.rs |
Use cross-platform Metadata::len() instead of Unix-only MetadataExt::size(). |
crates/kernel/src/helper_layers/reload.rs |
Recover from poisoned locks when reading/writing reloadable service. |
crates/kernel/src/helper_layers/map_request/add_extension.rs |
Simplify return type signature (no behavior change). |
crates/kernel/src/lib.rs |
Add explicit lifetimes for iter-returning trait methods. |
crates/kernel/src/service/http_route/match_hostname.rs |
Make iterator accessors explicit lifetimes; replace unsafe mutable traversal with safe logic. |
crates/model/src/plugin.rs |
Replace eprintln! with tracing::warn! during plugin map deserialization. |
crates/model/Cargo.toml |
Add tracing dependency for new logging usage. |
crates/config/src/service/k8s/listen.rs |
Replace unwrap_or_default() stream handling with explicit error/close handling for watchers. |
binary/spacegate/src/main.rs |
Use tracing instead of stdout/stderr for dylib loading messages. |
binary/admin-server/src/service/auth.rs |
Avoid panics; strengthen cookie attributes (Secure/SameSite) when setting JWT cookie. |
REVIEW_REPORT.md |
Add workspace code review report. |
PERF_SECURITY_REPORT.md |
Add performance & security deep-dive report. |
.github/skills/review/SKILL.md |
Add repository review guidelines (“skill”). |
Comments suppressed due to low confidence (1)
crates/shell/src/server.rs:222
- Lock poisoning is being silently ignored via
unwrap_or_else(|e| e.into_inner()). Consider logging a warning when recovering from poisoning so operational tooling can surface that a panic occurred while holding this lock.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Err(e) => { | ||
| tracing::error!("[SG.Config] http_route watcher error: {e}; retrying"); | ||
| continue; | ||
| } |
There was a problem hiding this comment.
On watcher errors this loops immediately with continue, which can become a tight retry loop (high CPU + log spam) if the watcher keeps erroring. Consider adding a small backoff and/or recreating the watcher stream on error to avoid spinning.
| let mut g_store = store.lock().unwrap_or_else(|e| e.into_inner()); | ||
| for (_, s) in g_store.drain() { | ||
| task.spawn(s.shutdown()); | ||
| } |
There was a problem hiding this comment.
Lock poisoning is being silently ignored via unwrap_or_else(|e| e.into_inner()), which hides the fact that a panic occurred while holding the lock (and the protected state may be inconsistent). Consider logging a tracing::warn! when recovering from poisoning, or centralizing this into a helper to keep behavior consistent.
| pub fn global_remove(gateway_name: impl AsRef<str>) -> Option<RunningSgGateway> { | ||
| let global_store = Self::global_store(); | ||
| let mut global_store = global_store.lock().expect("poisoned lock"); | ||
| let mut global_store = global_store.lock().unwrap_or_else(|e| e.into_inner()); | ||
| global_store.remove(gateway_name.as_ref()) | ||
| } |
There was a problem hiding this comment.
Lock poisoning is being silently ignored via unwrap_or_else(|e| e.into_inner()). Consider logging a warning when recovering from poisoning so this doesn't mask an earlier panic and possible inconsistent state.
| let service = self.service.clone(); | ||
| let rg = service.read().expect("should never be posisoned"); | ||
| let rg = service.read().unwrap_or_else(|e| e.into_inner()); | ||
| let fut = rg.call(req); | ||
| drop(rg); |
There was a problem hiding this comment.
Lock poisoning is being silently ignored via unwrap_or_else(|e| e.into_inner()). Consider logging a warning when recovering from poisoning so operators can correlate unexpected behavior with the original panic.
| let cookie = format!("jwt={jwt}; Path=/; HttpOnly; Secure; SameSite=Strict; Max-Age={EXPIRE}"); | ||
| let cookie_val = HeaderValue::from_str(&cookie).map_err(InternalError::boxed)?; | ||
| response.headers_mut().insert(SET_COOKIE, cookie_val); |
There was a problem hiding this comment.
Set-Cookie is always marked Secure, but admin-server currently binds plain HTTP (see main.rs using TcpListener::bind). Browsers will ignore or never send a Secure cookie over HTTP, which can break browser-based auth flows (no JWT cookie on subsequent requests). Consider making Secure conditional on TLS / X-Forwarded-Proto, or returning the JWT in the response body so non-cookie clients can authenticate reliably.
| Err(e) => { | ||
| tracing::error!("[SG.Config] sgfilter watcher error: {e}; retrying"); | ||
| continue; | ||
| } |
There was a problem hiding this comment.
On watcher errors this loops immediately with continue, which can become a tight retry loop (high CPU + log spam) if the watcher keeps erroring. Consider adding a small backoff and/or recreating the watcher stream on error to avoid spinning.
| let global_store = Self::global_store(); | ||
| let mut global_store = global_store.lock().expect("poisoned lock"); | ||
| let mut global_store = global_store.lock().unwrap_or_else(|e| e.into_inner()); | ||
| global_store.insert(gateway_name.into(), gateway); | ||
| } |
There was a problem hiding this comment.
Lock poisoning is being silently ignored via unwrap_or_else(|e| e.into_inner()), which hides the fact that a panic occurred while holding the lock. Consider logging a warning when recovering from poisoning (or using a small helper) so operators can correlate later inconsistencies with the original panic.
| // casts are needed. Splitting the children / else branches is done via | ||
| // `take`/owned traversal because NLL still can't see through the | ||
| // conditional borrow. |
There was a problem hiding this comment.
The comment mentions using take/owned traversal, but the implementation now uses a two-step probe (has_child_match/descend_has) without take. Consider updating the comment to match the current approach so future maintainers understand why the extra lookup is needed.
| // casts are needed. Splitting the children / else branches is done via | |
| // `take`/owned traversal because NLL still can't see through the | |
| // conditional borrow. | |
| // casts are needed. We first probe with immutable lookups to see which | |
| // branch can match, then take a mutable borrow only for that branch, | |
| // because NLL still can't see through the conditional borrow. |
| if let Some(wg) = self.service.get() { | ||
| let mut wg = wg.write().expect("should never be poisoned"); | ||
| let mut wg = wg.write().unwrap_or_else(|e| e.into_inner()); | ||
| *wg = service; |
There was a problem hiding this comment.
Lock poisoning is being silently ignored via unwrap_or_else(|e| e.into_inner()). Consider logging a warning when recovering from poisoning so it’s visible that a panic occurred while holding the lock (and the state may be inconsistent).
| break; | ||
| } | ||
| Err(e) => { | ||
| tracing::error!("[SG.Config] gateway watcher error: {e}; retrying"); |
There was a problem hiding this comment.
On watcher errors this loops immediately with continue, which can become a tight retry loop (high CPU + log spam) if the watcher keeps erroring. Consider adding a small backoff (e.g. tokio::time::sleep) and/or recreating the watcher stream on error to avoid spinning.
| tracing::error!("[SG.Config] gateway watcher error: {e}; retrying"); | |
| tracing::error!("[SG.Config] gateway watcher error: {e}; retrying after backoff"); | |
| tokio::time::sleep(std::time::Duration::from_secs(1)).await; |
No description provided.