Skip to content

Latest commit

 

History

History
127 lines (102 loc) · 4.14 KB

File metadata and controls

127 lines (102 loc) · 4.14 KB

llmfail

Local OpenAI-compatible LLM failover proxy for IDEs and agent CLIs.

Quickstart

go run ./cmd/llmfail init
go run ./cmd/llmfail doctor --config config.yaml
go run ./cmd/llmfail start --config config.yaml --watch

Point OpenAI-compatible clients to http://127.0.0.1:8080/v1.

For SDK and IDE setup, see docs/compatibility.md.

Example Config

listen: "127.0.0.1:8080"
admin_listen: "127.0.0.1:8081"
routes:
  - name: coding
    endpoints: ["/v1/chat/completions"]
    strategy: priority
    targets: ["anthropic-primary", "openai-fallback"]
targets:
  - name: anthropic-primary
    provider: anthropic-main
    model: claude-sonnet-4-20250514
    timeout: 30s
    max_retries: 1
    max_context: 200000
    required_capabilities: [chat, stream, tools]
  - name: openai-fallback
    provider: openai-main
    model: gpt-5-mini
    timeout: 30s
    max_retries: 1
    max_context: 400000
    required_capabilities: [chat, stream, tools]
providers:
  - name: anthropic-main
    type: anthropic
    base_url: "https://api.anthropic.com"
    api_key_env: ANTHROPIC_API_KEY
  - name: openai-main
    type: openai
    base_url: "https://api.openai.com/v1"
    api_key_env: OPENAI_API_KEY
observability:
  response_headers: true
  log_prompts: false
  log_responses: false

Endpoints

  • POST /v1/chat/completions
  • GET /v1/models
  • GET /healthz
  • GET /readyz
  • GET /debug/routes on loopback only
  • POST /v1/responses returns unsupported_endpoint in P0

CLI

  • llmfail init
  • llmfail start --config config.yaml
  • llmfail config validate --config config.yaml
  • llmfail doctor --config config.yaml
  • llmfail route explain --config config.yaml --file request.json
  • llmfail providers check --config config.yaml
  • llmfail reload --admin-url http://127.0.0.1:8081
  • llmfail version

Compatibility Matrix

Client P0 status Setup
OpenAI Python SDK Supported Set base_url="http://127.0.0.1:8080/v1" and any local client API key if enabled
OpenAI JS SDK Supported Set baseURL: "http://127.0.0.1:8080/v1"
Cursor Supported Use OpenAI-compatible provider with base URL http://127.0.0.1:8080/v1
Cline Supported Use OpenAI-compatible custom provider and local base URL
Continue.dev Supported Configure OpenAI-compatible model and override API base
Codex CLI Supported Point OpenAI-compatible base URL to llmfail
RooCode Best effort Use OpenAI-compatible custom provider
Claude Code Experimental P0, P1 target Native /v1/messages ingress is deferred
Gemini CLI Deferred P1 Requires Gemini-compatible ingress, not OpenAI Chat Completions

Troubleshooting

Symptom Likely Reason Command
401 from provider Missing or wrong env var llmfail doctor --config config.yaml
Tool calls disappear Fallback target lacks tools capability llmfail route explain --file request.json
Streaming hangs Client/upstream SSE issue llmfail doctor --config config.yaml
Always uses fallback Primary circuit is open GET /debug/status or llmfail start --tui
400 after fallback Unsupported parameter translation llmfail route explain --file request.json

Security Defaults

  • Default listen address is loopback-only.
  • Non-loopback listen fails validation unless client auth is configured.
  • Client auth supports Authorization: Bearer ... and x-api-key.
  • Provider auth is independent from client auth.
  • Prompts, responses, tool arguments and secrets are not logged by default.
  • Admin reload is exposed on admin_listen and rejects non-loopback clients.

Benchmark Methodology

Use hey or wrk against an in-process or LAN mock OpenAI-compatible server. Measure proxy overhead with two configured targets, both circuit-closed, at 100 concurrent connections after a warmup window.

Release

Release builds are described by goreleaser.yaml for Linux, macOS and Windows on amd64 and arm64. Docker builds use the included multi-stage Dockerfile.

Local release readiness check:

sh scripts/check_coverage.sh
sh scripts/release_check.sh

License

MIT License. Copyright (c) 2026 Evgeny Balyakin.