A lightweight HTTP proxy that accepts OpenAI-format API requests and routes them to multiple backends: OpenAI, GCP Vertex AI (Gemini and Claude), and OpenResponses-compatible servers.
Model routing is glob-based (first match wins), so a single proxy instance
can fan out gpt-* to OpenAI, claude-* to Vertex AI, and everything else
to a local OpenResponses server.
llmproxy aims to be a much lighter alternative to LiteLLM. It only maps the OpenAI API (Chat Completions) and OpenResponses API to proprietary backends — there is no cost management, rate limiting, key vaulting, or other platform features. The scope is deliberately narrow: translate request/response formats and route by model glob.
The primary intended use case is agentic sandboxing systems; by running an llmproxy instance in a separate container, we can avoid leaking credentials into the agent container.
llmproxy is written in Rust for correctness and efficiency. It is also
designed to be usable as a Rust crate (use llmproxy::...), so projects
like Goose can embed the proxy in-process
rather than running it as a sidecar.
This crate is largely generated by Opus 4.6, but much of the code has been reviewed.
The llm crate is a Rust client SDK that provides a unified API for calling multiple LLM providers. It covers more backends (12+) and exposes custom Rust types for messages, tool calls, etc.
llmproxy takes a different approach: instead of a bespoke Rust API, it uses the OpenAI Responses API as its canonical format — both over the network and as an in-process crate interface. A project like Goose could embed llmproxy as a library and talk Responses API internally, getting multi-provider routing without adopting a non-standard type system. The trade-off is that llmproxy doesn't abstract away the wire format — you're always working with (typed, serde-backed) API types rather than a higher-level SDK.
Both projects handle Anthropic format translation and support streaming.
llmproxy's #[serde(flatten)] passthrough pattern preserves unknown API
fields, which matters for a proxy but not for a client SDK.
Lightly tested with the use case of OpenAI (via OpenCode) talking to GCP vertex.
For a single backend (the common case), no config file is needed — just set the backend via CLI flags or environment variables.
podman run --rm -p 8080:8080 \
-v $HOME/.config/gcloud:/gcp-creds:ro \
-e GOOGLE_APPLICATION_CREDENTIALS=/gcp-creds/application_default_credentials.json \
-e LLMPROXY_BACKEND=gcp-vertex \
-e GOOGLE_CLOUD_PROJECT=my-project \
-e VERTEX_LOCATION=us-east5 \
ghcr.io/cgwalters/llmproxyllmproxy --backend gcp-vertex --project my-project --location us-east5For multi-backend setups, use a config file.
The proxy listens on 0.0.0.0:8080 by default (override with -l /
--listen or in the config file).
git clone https://github.com/cgwalters/llmproxy
cd llmproxy
cargo build --release
./target/release/llmproxy --backend gcp-vertex --project my-project --location us-east5See llmproxy.example.toml for the full format.
The key ideas:
[server]sets the listen address.[[backends]]defines one or more backends, each with akind(openai,gcp-vertex, oropenresponses), credentials, and amodelslist of glob patterns. The first backend whose pattern matches the requested model wins.
The gcp-vertex backend authenticates via Application Default Credentials
(ADC), checked in this order:
GOOGLE_APPLICATION_CREDENTIALSenv var (service account JSON key)~/.config/gcloud/application_default_credentials.json(fromgcloud auth application-default login)- GCE metadata server
Tokens are cached and refreshed automatically.
Set the LLMPROXY_API_KEY environment variable to require clients to
authenticate with Authorization: Bearer <key>. When the variable is
unset, all requests are accepted without auth.
The /healthz endpoint is always unauthenticated.
Point OpenCode at the proxy by adding a provider to your config
(~/.config/opencode/config.json):
{
"provider": {
"llmproxy": {
"npm": "@ai-sdk/openai-compatible",
"name": "llmproxy",
"options": { "baseURL": "http://127.0.0.1:8080/v1" },
"models": {
"claude-opus-4-6": { "name": "Claude Opus 4" }
}
}
},
"model": "llmproxy/claude-opus-4-6"
}And set the API key in ~/.local/share/opencode/auth.json (must match
LLMPROXY_API_KEY, or any non-empty string if auth is disabled):
{ "llmproxy": { "apiKey": "your-llmproxy-api-key" } }Pre-built images are published to ghcr.io/lobstertrap/llmproxy.
See the Quick start above for a typical podman run invocation.
To build the image locally instead:
podman build -t llmproxy .The Containerfile uses BuildKit cache mounts for fast incremental rebuilds.
There is also a Justfile with shortcuts: just container-build,
just container-run, just ci, etc. Run just --list for the full set.
cargo test
cargo clippy # lint
just ci # fmt-check + clippy + testApache-2.0. GCP auth module adapted from Goose (also Apache-2.0).