Resilient multi-LLM CLI orchestration for OpenCode
Execute Claude, Gemini & Codex with circuit breakers, smart retry, automatic fallback, and role-based routing.
Running AI CLIs in production is fragile. Processes timeout, rate limits hit, binaries disappear. Calling three different CLIs means three different failure modes, arg formats, and platform quirks.
opencode-cli-enforcer wraps all of that into a single, resilient plugin:
|
Without this plugin
|
With this plugin
|
cli_exec(prompt)
|
v
+----------------------------------------------------------+
| Resilience Engine |
| |
| Global Time Budget (shared across ALL attempts) |
| +---------+ +---------+ +---------+ |
| | Claude | --> | Gemini | --> | Codex | fallback |
| +---------+ +---------+ +---------+ chain |
| | | | |
| v v v |
| [Circuit Breaker] -----> [Retry w/ Backoff] --> [execa] |
| 3 failures = open max 2 retries 10MB |
| 5 timeouts = open 1s-10s + jitter buffer |
| 60s cooldown abort-aware sleep |
+----------------------------------------------------------+
Add to your OpenCode configuration:
{
"plugin": ["opencode-cli-enforcer@latest"]
}bun add opencode-cli-enforcer
# or
npm install opencode-cli-enforcerYou need at least one CLI installed and authenticated:
| CLI | Install | Auth |
|---|---|---|
| Claude Code | npm i -g @anthropic-ai/claude-code |
claude login |
| Gemini CLI | npm i -g @anthropic-ai/gemini-cli |
gcloud auth login |
| Codex CLI | npm i -g @openai/codex |
codex auth |
The primary tool. Sends a prompt to a CLI with automatic retry, circuit breaker protection, and fallback.
cli_exec({
cli: "claude",
prompt: "Explain the observer pattern with a TypeScript example",
mode: "generate", // "generate" | "analyze"
timeout_seconds: 300, // Global budget: 10-1800s
allow_fallback: true // Try gemini/codex on failure
})| Parameter | Type | Default | Description |
|---|---|---|---|
cli |
"claude" | "gemini" | "codex" |
required | Primary CLI provider |
prompt |
string |
required | Prompt to send (max 100KB) |
mode |
"generate" | "analyze" |
"generate" |
analyze enables file reads (Claude only) |
timeout_seconds |
number |
720 |
Global timeout budget in seconds |
allow_fallback |
boolean |
true |
Auto-fallback to alternative providers |
Response:
Returns real-time health for all providers: installation status, circuit breaker state, and usage statistics.
cli_status({})Response:
{
"platform": "windows",
"detection_complete": true,
"retry_config": { "max_retries": 2, "base_delay_ms": 1000, "max_delay_ms": 10000 },
"breaker_config": { "failure_threshold": 3, "timeout_threshold": 5, "cooldown_seconds": 60 },
"providers": [
{
"name": "claude",
"installed": true,
"path": "/usr/local/bin/claude",
"version": "1.0.16",
"circuit_breaker": {
"state": "closed",
"consecutive_failures": 0,
"consecutive_timeouts": 0,
"total_executions": 12,
"total_failures": 1,
"total_timeouts": 0
},
"usage": {
"total_calls": 12,
"success_rate": "92%",
"avg_duration_ms": 3400
},
"fallback_order": ["gemini", "codex"]
}
// ... gemini, codex
]
}Quick check of which CLIs are available on the system.
cli_list({})Response:
{
"installed_count": 2,
"providers": [
{ "provider": "claude", "path": "/usr/local/bin/claude", "version": "1.0.16", "strengths": ["reasoning", "code-analysis", "debugging", "architecture", "planning"] },
{ "provider": "gemini", "path": "/usr/local/bin/gemini", "version": "0.1.8", "strengths": ["research", "trends", "knowledge", "large-context", "web-search"] }
]
}Recommends the best CLI for a task based on agent role. Considers both provider strengths and real-time availability.
cli_route({
role: "developer",
task_description: "Refactor the auth module to use JWT"
})| Parameter | Type | Description |
|---|---|---|
role |
"manager" | "coordinator" | "developer" | "researcher" | "reviewer" | "architect" |
Agent role |
task_description |
string? |
Optional context |
Routing table:
| Role | Primary CLI | Reasoning |
|---|---|---|
| Manager | Gemini | Research, trends, large-context analysis |
| Coordinator | Claude | Reasoning, planning, decision-making |
| Developer | Codex | Code generation, refactoring, full-auto |
| Researcher | Gemini | Knowledge synthesis, web search |
| Reviewer | Claude | Code analysis, debugging, quality |
| Architect | Claude | System design, architecture planning |
Response:
{
"role": "developer",
"task_description": "Refactor the auth module to use JWT",
"recommended_cli": "codex",
"reasoning": "Role \"developer\" maps to codex (code-generation, edits, refactoring, full-auto).",
"fallback_chain": ["codex", "claude", "gemini"],
"availability": { "codex": true, "claude": true, "gemini": false }
}Unlike per-attempt timeouts, the global time budget is shared across ALL retries and ALL fallback providers. This prevents timeout multiplication:
Traditional: 3 providers x 3 attempts x 300s timeout = 2700s worst case
This plugin: 300s total budget across everything = 300s worst case
Each attempt receives the remaining seconds, not the full budget. When the budget runs out, execution stops immediately.
Per-CLI failure isolation with separate thresholds for failures and timeouts (because slow ≠ broken):
| State | Behavior | Transition |
|---|---|---|
| Closed | Normal operation, requests pass through | 3 failures OR 5 timeouts → Open |
| Open | All requests blocked, provider is skipped | After 60s cooldown → Half-Open |
| Half-Open | One probe request allowed | Success → Closed / Failure → Open |
Attempt 0: immediate
Attempt 1: ~1s + jitter (+-30%)
Attempt 2: ~2s + jitter (+-30%)
capped at 10s max
- Transient errors (network, socket): standard retry
- Rate limits (429, quota): retry with 3x longer delay
- Process timeouts: skip retries entirely, move to next provider
- Permanent errors (auth, 401/403): skip retries, move to fallback
- Crash (SIGKILL, ENOENT): skip retries, move to fallback
Error arrives
|
+-- exitCode 137 / SIGKILL / ENOENT ---------> CRASH (no retry)
+-- 429 / "rate limit" / "quota" -------------> RATE_LIMIT (retry, 3x delay)
+-- 401 / 403 / "auth" / "not found" --------> PERMANENT (no retry)
+-- everything else --------------------------> TRANSIENT (retry)
When a provider fails, the next one in the chain takes over automatically:
Claude ---[fail]---> Gemini ---[fail]---> Codex
Gemini ---[fail]---> Claude ---[fail]---> Codex
Codex ---[fail]---> Claude ---[fail]---> Gemini
| Feature | Windows | macOS / Linux |
|---|---|---|
| Binary detection | where |
which |
.cmd/.bat shims |
Auto-wrapped with cmd /c |
N/A |
| PATH augmentation | npm, scoop, cargo, pnpm dirs | Standard PATH |
| Large prompts (>30KB) | Delivered via stdin to avoid OS arg-length limits |
|
| Environment | Allowlisted vars only (no secrets leak to subprocesses) | |
CLI availability is cached for 5 minutes to avoid repeated filesystem lookups. The cache covers:
- Binary path resolution
- Version detection
- Both positive and negative results
| Protection | Description |
|---|---|
| Secret redaction | API keys (sk-*, key-*, AIza*, ant-api*) and Bearer tokens stripped from all output |
| Environment filtering | Only system essentials + proxy vars passed to subprocesses. No API keys — CLIs handle their own auth. |
| Input isolation | Large prompts (>30KB) delivered via stdin, not shell args |
| No shell interpolation | All CLI execution via execa (no shell: true) |
const result = await cli_exec({
cli: "claude",
prompt: "Review this function for bugs:\n\nfunction add(a, b) { return a - b }",
mode: "analyze",
timeout_seconds: 120
})
// result.stdout → "Bug found: the function is named `add` but performs subtraction..."// Claude's circuit breaker is open (too many recent failures)
const result = await cli_exec({
cli: "claude",
prompt: "Generate a REST API for user management",
allow_fallback: true
})
// result.cli → "gemini" (automatic fallback)
// result.used_fallback → true// For a developer task, route to Codex (best at code generation)
const recommendation = await cli_route({
role: "developer",
task_description: "Implement pagination for the /users endpoint"
})
// recommendation.recommended_cli → "codex"
// Then execute with the recommended CLI
const result = await cli_exec({
cli: recommendation.recommended_cli,
prompt: "Implement pagination for the /users endpoint using cursor-based pagination"
})const status = await cli_status({})
for (const provider of status.providers) {
console.log(`${provider.name}: ${provider.circuit_breaker.state} | ${provider.usage.success_rate}`)
}
// claude: closed | 95%
// gemini: closed | 88%
// codex: open | 60% <-- circuit breaker trippedconst largeCodebase = readFileSync("src/index.ts", "utf-8") // 45KB file
const result = await cli_exec({
cli: "claude",
prompt: `Analyze this codebase for security vulnerabilities:\n\n${largeCodebase}`,
mode: "analyze",
timeout_seconds: 600
})
// Prompt >30KB → automatically delivered via stdin (no OS arg-length issues)The plugin injects two hooks into the OpenCode lifecycle:
Automatically injects CLI availability into the system prompt of every agent (except orchestrator and task_decomposer), so the LLM knows which tools are available and their current health.
Tracks when agents invoke CLIs directly via bash instead of using cli_exec, incrementing usage counters for observability.
| Provider | Binary | Strengths |
|---|---|---|
| Claude | claude |
Reasoning, code analysis, debugging, architecture, planning |
| Gemini | gemini |
Research, trends, knowledge, large-context, web search |
| Codex | codex |
Code generation, edits, refactoring, full-auto mode |
| Parameter | Value | Description |
|---|---|---|
failureThreshold |
3 |
Consecutive failures before opening |
timeoutThreshold |
5 |
Consecutive timeouts before opening (slow ≠ broken) |
cooldownMs |
60000 |
Milliseconds before open → half-open |
halfOpenSuccessThreshold |
1 |
Successes in half-open to close |
| Parameter | Value | Description |
|---|---|---|
maxRetries |
2 |
Maximum retry attempts per provider |
baseDelayMs |
1000 |
Base delay for exponential backoff |
maxDelayMs |
10000 |
Maximum delay cap |
jitterFactor |
0.3 |
Random jitter range (+-30%) |
| Parameter | Value | Description |
|---|---|---|
STDIN_THRESHOLD |
30000 |
Characters before switching to stdin delivery |
MAX_BUFFER |
10MB |
Maximum stdout/stderr buffer |
bun install # Install dependencies
bun test # Run all tests (85 tests)
bun test --watch # Watch mode
bun test tests/retry.test.ts # Run a single test file
bun run typecheck # Type-check without emitting
bun run build # Buildsrc/
index.ts Plugin entry, 4 tools, 2 hooks
resilience.ts Global time budget, retry + breaker + fallback
circuit-breaker.ts Per-CLI state machine (failures + timeouts)
executor.ts execa wrapper, Windows handling, PATH augmentation
cli-defs.ts Provider configs, arg builders, role routing
detection.ts CLI auto-detection with 5-min cache
retry.ts Exponential backoff, abort-aware sleep
error-classifier.ts Error categorization for retry decisions
safe-env.ts Environment variable allowlist
redact.ts Secret redaction
platform.ts OS detection
tests/
8 test files, 85 tests covering all modules
- Fork the repo
- Create a feature branch from
develop:git checkout -b feat/my-feature develop - Make your changes and add tests
- Run
bun test(all 85 must pass) - Open a PR to
develop
{ "success": true, "cli": "claude", "stdout": "The Observer pattern is a behavioral design pattern...", "stderr": "", "duration_ms": 4523, "timed_out": false, "used_fallback": false, "fallback_chain": ["claude"], "error": null, "error_class": null, // "transient" | "rate_limit" | "permanent" | "crash" "circuit_state": "closed", // "closed" | "open" | "half-open" "attempt": 1, "max_attempts": 3 }