Skip to content

Commit 7fd5680

Browse files
Durakclaude
andcommitted
oraclemcp P2-1: admission control / backpressure (§5.6)
Bead oracle-qmwz.3.1. oraclemcp-core admission.rs: AdmissionController bounds concurrency before the pool — global cap (= pool max_size) + per-agent cap via tokio Semaphore; try_admit takes the per-agent permit first then global; over budget -> structured Busy{retry_after_ms}. RAII AdmissionPermit. 4 tests; clippy -D warnings + fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent c08fa70 commit 7fd5680

3 files changed

Lines changed: 152 additions & 1 deletion

File tree

.beads/issues.jsonl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -410,7 +410,7 @@
410410
{"id":"oracle-qmwz.2.9.4","title":"P1-9d — Origin checks / DNS-rebinding guard / reject non-loopback http","description":"Subtask. Approach: validate Origin; DNS-rebinding guard; reject non-loopback http://. These are known rmcp HTTP failure modes. Success: rebinding/bad-origin requests rejected. Test: integration with crafted Origin/host. Considerations: part of front-loaded HTTP hardening (R12).","status":"open","priority":1,"issue_type":"task","created_at":"2026-06-01T13:34:19.939816116Z","created_by":"durakovic","updated_at":"2026-06-01T13:34:19.939816116Z","source_repo":"plsql-intelligence","compaction_level":0,"original_size":0,"labels":["oraclemcp"],"dependencies":[{"issue_id":"oracle-qmwz.2.9.4","depends_on_id":"oracle-qmwz.2.9","type":"parent-child","created_at":"2026-06-01T13:34:19.939816116Z","created_by":"durakovic","metadata":"{}","thread_id":""}]}
411411
{"id":"oracle-qmwz.2.9.5","title":"P1-9e — Progressive scope -> operating-level ceiling mapping (scope can only LOWER)","description":"Subtask. Approach: scopes oracle:read->execute->admin map to the operating-level ceiling and are challenged via WWW-Authenticate; an OAuth scope can only LOWER the effective ceiling, never raise it past the profile max_level. Success: an oracle:read token cannot invoke a write tool. Test: integration scope enforcement. Considerations: composes with RBAC (P2-4) and the per-target max_level (P0-7).","status":"open","priority":1,"issue_type":"task","created_at":"2026-06-01T13:34:20.108875616Z","created_by":"durakovic","updated_at":"2026-06-01T13:34:20.766921264Z","source_repo":"plsql-intelligence","compaction_level":0,"original_size":0,"labels":["oraclemcp"],"dependencies":[{"issue_id":"oracle-qmwz.2.9.5","depends_on_id":"oracle-qmwz.2.9","type":"parent-child","created_at":"2026-06-01T13:34:20.108875616Z","created_by":"durakovic","metadata":"{}","thread_id":""},{"issue_id":"oracle-qmwz.2.9.5","depends_on_id":"oracle-qmwz.2.9.2","type":"blocks","created_at":"2026-06-01T13:34:20.766325908Z","created_by":"durakovic","metadata":"{}","thread_id":""}]}
412412
{"id":"oracle-qmwz.3","title":"OMCP Phase 2 — Production hardening (the gate to fully safe-for-production)","description":"PHASE 2 EPIC — production hardening. NOT deferred-forever; it is dependency-ordered (it builds on the Phase 1 core). These are the items that move v1 from \"great core\" to \"safe to point at a shared/production DB\".\n\n## Scope (tasks)\nP2-1 admission control / backpressure (semaphore + per-agent caps + fair queue + structured BUSY); P2-2 cancellation + graceful shutdown + crash rollback; P2-3 execute-in-savepoint preview + transaction/savepoint/DBMS_OUTPUT tools; P2-4 RBAC per tool (scope->max_level) + session-elevation windows + replay-hardening; P2-5 Vault secrets backend; P2-6 OTel metrics/traces; P2-7 Tier-2 PL/Scope intelligence + recompile_with_plscope; P2-9 privilege-degradation matrix + capability reporting; P2-10 Oracle Unified Auditing policy as system-of-record. (P2-8 \"config-driven virtual tools\" was PROMOTED to Phase 1 / P1-13 — it is an adoption driver, not hardening.)\n\n## Success criteria\nThe server survives multi-agent load on a shared DB without causing incidents (no pool starvation / ORA-12519), cancels cleanly without double-executing DML, previews real blast radius, and produces an authoritative audit trail.","status":"open","priority":2,"issue_type":"epic","created_at":"2026-06-01T13:24:54.034744121Z","created_by":"durakovic","updated_at":"2026-06-01T13:24:54.034744121Z","source_repo":"plsql-intelligence","compaction_level":0,"original_size":0,"labels":["oraclemcp"],"dependencies":[{"issue_id":"oracle-qmwz.3","depends_on_id":"oracle-qmwz","type":"parent-child","created_at":"2026-06-01T13:24:54.034744121Z","created_by":"durakovic","metadata":"{}","thread_id":""}]}
413-
{"id":"oracle-qmwz.3.1","title":"P2-1 — Admission control / backpressure (semaphore + per-agent caps + fair queue + BUSY)","description":"## Background\nFixed pool + N agents x M concurrent calls = pool starvation + ORA-12519/ORA-00018 against a shared prod DB — the most likely way the server itself causes an incident.\n## Technical approach\nGlobal concurrency cap = pool max_size, enforced by tokio::sync::Semaphore; per-agent caps on top; a bounded FAIR queue; over budget -> structured {error:BUSY, retry_after_ms:N} BEFORE touching the pool. HTTP path adds tower-governor (GCRA) keyed by agent identity. Never let the 512-thread blocking pool be the limiter — the semaphore is.\n## Success criteria\nUnder load, excess calls get BUSY+retry-after without exhausting the pool; per-agent fairness holds.\n## Test plan\n- Chaos: pool exhaustion returns BUSY, not ORA-12519; per-agent cap enforced.\n## Considerations\nThis is the backbone of safe multi-agent HTTP sharing.","status":"open","priority":2,"issue_type":"task","created_at":"2026-06-01T13:30:40.232696237Z","created_by":"durakovic","updated_at":"2026-06-01T13:30:41.409914216Z","source_repo":"plsql-intelligence","compaction_level":0,"original_size":0,"labels":["oraclemcp"],"dependencies":[{"issue_id":"oracle-qmwz.3.1","depends_on_id":"oracle-qmwz.1.4","type":"blocks","created_at":"2026-06-01T13:30:41.409300310Z","created_by":"durakovic","metadata":"{}","thread_id":""},{"issue_id":"oracle-qmwz.3.1","depends_on_id":"oracle-qmwz.3","type":"parent-child","created_at":"2026-06-01T13:30:40.232696237Z","created_by":"durakovic","metadata":"{}","thread_id":""}]}
413+
{"id":"oracle-qmwz.3.1","title":"P2-1 — Admission control / backpressure (semaphore + per-agent caps + fair queue + BUSY)","description":"## Background\nFixed pool + N agents x M concurrent calls = pool starvation + ORA-12519/ORA-00018 against a shared prod DB — the most likely way the server itself causes an incident.\n## Technical approach\nGlobal concurrency cap = pool max_size, enforced by tokio::sync::Semaphore; per-agent caps on top; a bounded FAIR queue; over budget -> structured {error:BUSY, retry_after_ms:N} BEFORE touching the pool. HTTP path adds tower-governor (GCRA) keyed by agent identity. Never let the 512-thread blocking pool be the limiter — the semaphore is.\n## Success criteria\nUnder load, excess calls get BUSY+retry-after without exhausting the pool; per-agent fairness holds.\n## Test plan\n- Chaos: pool exhaustion returns BUSY, not ORA-12519; per-agent cap enforced.\n## Considerations\nThis is the backbone of safe multi-agent HTTP sharing.","status":"closed","priority":2,"issue_type":"task","created_at":"2026-06-01T13:30:40.232696237Z","created_by":"durakovic","updated_at":"2026-06-01T18:37:26.207906502Z","closed_at":"2026-06-01T18:37:26.207613975Z","close_reason":"P2-1 admission control / backpressure (oraclemcp-core admission.rs, §5.6). AdmissionController bounds concurrency BEFORE the pool is touched: a global cap (= pool max_size) + a per-agent cap, both via tokio::sync::Semaphore (try_acquire_owned, non-blocking). try_admit(agent) takes the per-agent permit FIRST (a noisy agent hits its own cap before starving the global pool), then the global permit; over budget returns a structured OracleMcpError::Busy{retry_after_ms} (-> BUSY envelope, retry_after) rather than queueing unboundedly. AdmissionPermit (RAII) returns both permits on drop. The semaphore, never the 512-thread blocking pool, is the limiter. 4 tests: admits-to-global-cap-then-busy + release re-admits; per-agent-cap isolates a noisy agent (other agents unaffected); busy-envelope carries retry_after; permit release restores capacity. clippy -D warnings + fmt clean.","source_repo":"plsql-intelligence","compaction_level":0,"original_size":0,"labels":["oraclemcp"],"dependencies":[{"issue_id":"oracle-qmwz.3.1","depends_on_id":"oracle-qmwz.1.4","type":"blocks","created_at":"2026-06-01T13:30:41.409300310Z","created_by":"durakovic","metadata":"{}","thread_id":""},{"issue_id":"oracle-qmwz.3.1","depends_on_id":"oracle-qmwz.3","type":"parent-child","created_at":"2026-06-01T13:30:40.232696237Z","created_by":"durakovic","metadata":"{}","thread_id":""}]}
414414
{"id":"oracle-qmwz.3.10","title":"P2-RES — MCP Resources + Prompts (oracle:// scheme + expert prompt playbooks)","description":"## Background\nTools alone are a gap (the completeness critic's \"tools-only\" finding). Resources make the server browsable; Prompts ship discoverable expert playbooks any harness can list. Resources land in P2 (v1 discovery uses oracle_capabilities).\n## Technical approach\n- Resources with a coherent scheme: oracle://schema/{owner} (object listing), oracle://object/{owner}/{type}/{name} (DDL/source), oracle://session/{lease_id} (live session state), oracle://capabilities, oracle://tools (the virtual-tool catalog — P1-13). Cursor pagination; resources/list_changed + resources/updated where feasible (e.g. DDL change via DBMS_CHANGE_NOTIFICATION).\n- Prompts (parameterized recipes): investigate_slow_query, safe_column_rename, explain_this_package, find_callers_of, generate_migration.\n## Success criteria\nA client lists resources + prompts; oracle://object returns DDL; a prompt produces a usable recipe.\n## Test plan\n- e2e: resources/list + read; prompts/list + get; pagination cursor.\n## Considerations\nBonus where the client supports them; tools never depend on them. Depends on rmcp (P0-6) + Tier-1 intelligence (P1-5) for content.","status":"open","priority":2,"issue_type":"task","created_at":"2026-06-01T13:56:11.181654587Z","created_by":"durakovic","updated_at":"2026-06-01T13:56:11.463544945Z","source_repo":"plsql-intelligence","compaction_level":0,"original_size":0,"labels":["oraclemcp"],"dependencies":[{"issue_id":"oracle-qmwz.3.10","depends_on_id":"oracle-qmwz.1.7","type":"blocks","created_at":"2026-06-01T13:56:11.341133819Z","created_by":"durakovic","metadata":"{}","thread_id":""},{"issue_id":"oracle-qmwz.3.10","depends_on_id":"oracle-qmwz.2.5","type":"blocks","created_at":"2026-06-01T13:56:11.462848431Z","created_by":"durakovic","metadata":"{}","thread_id":""},{"issue_id":"oracle-qmwz.3.10","depends_on_id":"oracle-qmwz.3","type":"parent-child","created_at":"2026-06-01T13:56:11.181654587Z","created_by":"durakovic","metadata":"{}","thread_id":""}]}
415415
{"id":"oracle-qmwz.3.11","title":"P2-RESIL — Resilience: circuit breaker + transient-only retry + resource limits + call timeouts","description":"## Background\n§10 production-hardening details with no dedicated bead. These keep the long-lived server stable under partial failure.\n## Technical approach\n- Circuit breaker (failsafe-rs): open after 5 consecutive ORA- errors.\n- backon retries ONLY transient codes (ORA-03113/03114/12170/12541), NEVER ORA-00942/01403; NEVER retry DML (double-execute risk).\n- Resource limits enforced in the handler via a LimitedStream over the row iterator: max_rows 10k hard cap, max_result_bytes 10MB, max_execution_time 30s.\n- Per-round-trip conn.set_call_timeout(30s); race query future against tokio::time::timeout; on timeout conn.break_execution() then DISCARD the connection.\n- Crash safety: panic=abort, panic hook -> tracing::error, no unwrap() on DB ops, PID file, systemd Restart=on-failure, flush exporters on shutdown.\n## Success criteria\nRepeated ORA- errors open the breaker; only transient codes retry; a runaway query is timed out and its connection discarded; caps enforced.\n## Test plan\n- Chaos: breaker opens after N errors; timeout discards conn; cap truncation.\n- Unit: retry policy classifies transient vs permanent; DML never retried.\n## Considerations\nComposes with admission control (P2-1) and cancellation (P2-2). Depends on connectivity (P0-3).","status":"open","priority":2,"issue_type":"task","created_at":"2026-06-01T13:56:11.564317524Z","created_by":"durakovic","updated_at":"2026-06-01T13:56:11.745034247Z","source_repo":"plsql-intelligence","compaction_level":0,"original_size":0,"labels":["oraclemcp"],"dependencies":[{"issue_id":"oracle-qmwz.3.11","depends_on_id":"oracle-qmwz.1.4","type":"blocks","created_at":"2026-06-01T13:56:11.744222823Z","created_by":"durakovic","metadata":"{}","thread_id":""},{"issue_id":"oracle-qmwz.3.11","depends_on_id":"oracle-qmwz.3","type":"parent-child","created_at":"2026-06-01T13:56:11.564317524Z","created_by":"durakovic","metadata":"{}","thread_id":""}]}
416416
{"id":"oracle-qmwz.3.2","title":"P2-2 — Cancellation + graceful shutdown + crash rollback","description":"## Background\nAgents abort/retry constantly. Without clean cancel-and-cleanup you leak cursors/sessions/row-locks and risk DOUBLE-EXECUTING DML on retry.\n## Technical approach\n- MCP cancel (notifications/cancelled / tasks/cancel) -> conn.break_execution() (OCI break) -> rollback any open txn on the leased session -> close cursors -> deterministic {can_retry:bool}. DML is NEVER auto-retried (only transient connection errors are: ORA-03113/03114/12170/12541; never ORA-00942/01403).\n- Graceful shutdown: SIGTERM sets a CancellationToken, fails /readyz, stops new work, rolls back in-flight txns, revokes leases + Vault leases, drains pool with deadline, flushes audit + OTel exporters, exits.\n- Crash: panic=abort, panic hook logs via tracing first, systemd Restart=on-failure; Oracle rolls back killed sessions; audit records the gap.\n## Success criteria\nCancel mid-DML rolls back and never double-executes; SIGTERM drains cleanly; crash leaves no stranded locks.\n## Test plan\n- Chaos: cancel mid-DML asserts no double-execute; SIGTERM drain; kill -9 + restart leaves no locks.\n## Considerations\nDepends on the lease primitive (P0-4).","status":"open","priority":2,"issue_type":"task","created_at":"2026-06-01T13:30:40.355966489Z","created_by":"durakovic","updated_at":"2026-06-01T13:30:41.574919704Z","source_repo":"plsql-intelligence","compaction_level":0,"original_size":0,"labels":["oraclemcp"],"dependencies":[{"issue_id":"oracle-qmwz.3.2","depends_on_id":"oracle-qmwz.1.5","type":"blocks","created_at":"2026-06-01T13:30:41.574182789Z","created_by":"durakovic","metadata":"{}","thread_id":""},{"issue_id":"oracle-qmwz.3.2","depends_on_id":"oracle-qmwz.3","type":"parent-child","created_at":"2026-06-01T13:30:40.355966489Z","created_by":"durakovic","metadata":"{}","thread_id":""}]}
Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
//! Admission control & backpressure (plan §5.6; bead P2-1).
2+
//!
3+
//! A fixed pool + N agents × M concurrent calls = pool starvation and
4+
//! `ORA-12519`. The admission controller bounds concurrency *before* the pool
5+
//! is touched: a global cap (= pool `max_size`) plus a per-agent cap, both
6+
//! enforced with `tokio::sync::Semaphore`. Over budget returns a structured
7+
//! `BUSY { retry_after_ms }` rather than queueing unboundedly — the semaphore,
8+
//! never the 512-thread blocking pool, is the limiter.
9+
10+
use std::collections::HashMap;
11+
use std::sync::{Arc, Mutex};
12+
13+
use oraclemcp_error::{ErrorEnvelope, OracleMcpError};
14+
use tokio::sync::{OwnedSemaphorePermit, Semaphore};
15+
16+
/// Default `retry_after_ms` returned with a `BUSY`.
17+
pub const DEFAULT_RETRY_AFTER_MS: u64 = 250;
18+
19+
/// A held admission permit. Dropping it returns capacity to both the global and
20+
/// per-agent semaphores.
21+
#[derive(Debug)]
22+
pub struct AdmissionPermit {
23+
_global: OwnedSemaphorePermit,
24+
_agent: OwnedSemaphorePermit,
25+
}
26+
27+
/// Bounds concurrency globally and per-agent.
28+
pub struct AdmissionController {
29+
global: Arc<Semaphore>,
30+
per_agent_cap: usize,
31+
agents: Mutex<HashMap<String, Arc<Semaphore>>>,
32+
retry_after_ms: u64,
33+
}
34+
35+
impl AdmissionController {
36+
/// A controller with a global cap (size the pool) and a per-agent cap.
37+
#[must_use]
38+
pub fn new(global_cap: usize, per_agent_cap: usize) -> Self {
39+
AdmissionController {
40+
global: Arc::new(Semaphore::new(global_cap.max(1))),
41+
per_agent_cap: per_agent_cap.max(1),
42+
agents: Mutex::new(HashMap::new()),
43+
retry_after_ms: DEFAULT_RETRY_AFTER_MS,
44+
}
45+
}
46+
47+
fn agent_semaphore(&self, agent: &str) -> Arc<Semaphore> {
48+
let mut agents = self.agents.lock().expect("admission mutex poisoned");
49+
Arc::clone(
50+
agents
51+
.entry(agent.to_owned())
52+
.or_insert_with(|| Arc::new(Semaphore::new(self.per_agent_cap))),
53+
)
54+
}
55+
56+
/// Try to admit a call for `agent` without waiting. Returns a permit, or a
57+
/// `BUSY` envelope when over the global or per-agent budget. The per-agent
58+
/// permit is taken first (a single noisy agent hits its own cap before
59+
/// starving the global pool).
60+
///
61+
/// # Errors
62+
/// Returns [`OracleMcpError::Busy`] when no capacity is available.
63+
pub fn try_admit(&self, agent: &str) -> Result<AdmissionPermit, OracleMcpError> {
64+
let agent_sem = self.agent_semaphore(agent);
65+
let agent_permit = agent_sem
66+
.try_acquire_owned()
67+
.map_err(|_| OracleMcpError::Busy {
68+
retry_after_ms: self.retry_after_ms,
69+
})?;
70+
let global_permit =
71+
Arc::clone(&self.global)
72+
.try_acquire_owned()
73+
.map_err(|_| OracleMcpError::Busy {
74+
retry_after_ms: self.retry_after_ms,
75+
})?;
76+
// agent_permit released on the early-return above if global fails.
77+
Ok(AdmissionPermit {
78+
_global: global_permit,
79+
_agent: agent_permit,
80+
})
81+
}
82+
83+
/// Convenience: the agent-facing `BUSY` envelope.
84+
#[must_use]
85+
pub fn busy_envelope(&self) -> ErrorEnvelope {
86+
OracleMcpError::Busy {
87+
retry_after_ms: self.retry_after_ms,
88+
}
89+
.into_envelope()
90+
}
91+
92+
/// Current available global permits (for `/readyz` / metrics).
93+
#[must_use]
94+
pub fn available_global(&self) -> usize {
95+
self.global.available_permits()
96+
}
97+
}
98+
99+
#[cfg(test)]
100+
mod tests {
101+
use super::*;
102+
103+
#[test]
104+
fn admits_up_to_global_cap_then_busy() {
105+
let ctrl = AdmissionController::new(2, 10);
106+
let p1 = ctrl.try_admit("a").expect("1");
107+
let p2 = ctrl.try_admit("b").expect("2");
108+
// Global cap (2) reached -> BUSY.
109+
assert!(matches!(
110+
ctrl.try_admit("c"),
111+
Err(OracleMcpError::Busy { .. })
112+
));
113+
drop(p1);
114+
// Capacity returned -> admits again.
115+
let _p3 = ctrl.try_admit("c").expect("3 after release");
116+
drop(p2);
117+
}
118+
119+
#[test]
120+
fn per_agent_cap_isolates_a_noisy_agent() {
121+
let ctrl = AdmissionController::new(100, 2);
122+
let _a1 = ctrl.try_admit("noisy").expect("a1");
123+
let _a2 = ctrl.try_admit("noisy").expect("a2");
124+
// The noisy agent hits its own cap (2) while the global pool is free.
125+
assert!(matches!(
126+
ctrl.try_admit("noisy"),
127+
Err(OracleMcpError::Busy { .. })
128+
));
129+
// A different agent is unaffected.
130+
let _b1 = ctrl.try_admit("quiet").expect("other agent admitted");
131+
}
132+
133+
#[test]
134+
fn busy_envelope_carries_retry_after() {
135+
let ctrl = AdmissionController::new(1, 1);
136+
let env = ctrl.busy_envelope();
137+
assert_eq!(env.retry_after_ms, Some(DEFAULT_RETRY_AFTER_MS));
138+
}
139+
140+
#[test]
141+
fn permit_release_restores_global_capacity() {
142+
let ctrl = AdmissionController::new(1, 5);
143+
assert_eq!(ctrl.available_global(), 1);
144+
let p = ctrl.try_admit("a").expect("admit");
145+
assert_eq!(ctrl.available_global(), 0);
146+
drop(p);
147+
assert_eq!(ctrl.available_global(), 1);
148+
}
149+
}

crates/oraclemcp-core/src/lib.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
//! the registry's `Tool` contract — the core never reaches into engine
1616
//! internals (the one-way boundary, §0 hard rule 1).
1717
18+
pub mod admission;
1819
pub mod capabilities;
1920
pub mod connect;
2021
pub mod init_token;
@@ -23,6 +24,7 @@ pub mod tools;
2324

2425
pub use server::{CAPABILITIES_TOOL, OracleMcpServer, ToolDispatch};
2526

27+
pub use admission::{AdmissionController, AdmissionPermit};
2628
pub use capabilities::{
2729
CapabilitiesReport, ConnectionStatus, FeatureTiers, OperatingLevelReport, PROTOCOL_VERSION,
2830
};

0 commit comments

Comments
 (0)