llmproxy

A lightweight HTTP proxy that accepts OpenAI-format API requests and routes them to multiple backends: OpenAI, GCP Vertex AI (Gemini and Claude), and OpenResponses-compatible servers.

Model routing is glob-based (first match wins), so a single proxy instance can fan out gpt-* to OpenAI, claude-* to Vertex AI, and everything else to a local OpenResponses server.

Goals

llmproxy aims to be a much lighter alternative to LiteLLM. It only maps the OpenAI API (Chat Completions) and OpenResponses API to proprietary backends — there is no cost management, rate limiting, key vaulting, or other platform features. The scope is deliberately narrow: translate request/response formats and route by model glob.

The primary intended use case is agentic sandboxing systems; by running an llmproxy instance in a separate container, we can avoid leaking credentials into the agent container.

llmproxy is written in Rust for correctness and efficiency. It is also designed to be usable as a Rust crate (use llmproxy::...), so projects like Goose can embed the proxy in-process rather than running it as a sidecar.

AI usage

This crate is largely generated by Opus 4.6, but much of the code has been reviewed.

Related: the `llm` crate

The llm crate is a Rust client SDK that provides a unified API for calling multiple LLM providers. It covers more backends (12+) and exposes custom Rust types for messages, tool calls, etc.

llmproxy takes a different approach: instead of a bespoke Rust API, it uses the OpenAI Responses API as its canonical format — both over the network and as an in-process crate interface. A project like Goose could embed llmproxy as a library and talk Responses API internally, getting multi-provider routing without adopting a non-standard type system. The trade-off is that llmproxy doesn't abstract away the wire format — you're always working with (typed, serde-backed) API types rather than a higher-level SDK.

Both projects handle Anthropic format translation and support streaming. llmproxy's #[serde(flatten)] passthrough pattern preserves unknown API fields, which matters for a proxy but not for a client SDK.

Status

Lightly tested with the use case of OpenAI (via OpenCode) talking to GCP vertex.

Quick start

For a single backend (the common case), no config file is needed — just set the backend via CLI flags or environment variables.

Container (GCP Vertex AI)

podman run --rm -p 8080:8080 \
  -v $HOME/.config/gcloud:/gcp-creds:ro \
  -e GOOGLE_APPLICATION_CREDENTIALS=/gcp-creds/application_default_credentials.json \
  -e LLMPROXY_BACKEND=gcp-vertex \
  -e GOOGLE_CLOUD_PROJECT=my-project \
  -e VERTEX_LOCATION=us-east5 \
  ghcr.io/cgwalters/llmproxy

Local

llmproxy --backend gcp-vertex --project my-project --location us-east5

For multi-backend setups, use a config file.

The proxy listens on 0.0.0.0:8080 by default (override with -l / --listen or in the config file).

Building from source

git clone https://github.com/cgwalters/llmproxy
cd llmproxy
cargo build --release
./target/release/llmproxy --backend gcp-vertex --project my-project --location us-east5

Configuration

See llmproxy.example.toml for the full format. The key ideas:

[server] sets the listen address.
[[backends]] defines one or more backends, each with a kind (openai, gcp-vertex, or openresponses), credentials, and a models list of glob patterns. The first backend whose pattern matches the requested model wins.

GCP Vertex AI authentication

The gcp-vertex backend authenticates via Application Default Credentials (ADC), checked in this order:

GOOGLE_APPLICATION_CREDENTIALS env var (service account JSON key)
~/.config/gcloud/application_default_credentials.json (from gcloud auth application-default login)
GCE metadata server

Tokens are cached and refreshed automatically.

Client authentication

Set the LLMPROXY_API_KEY environment variable to require clients to authenticate with Authorization: Bearer <key>. When the variable is unset, all requests are accepted without auth.

The /healthz endpoint is always unauthenticated.

Using with OpenCode

Point OpenCode at the proxy by adding a provider to your config (~/.config/opencode/config.json):

{
  "provider": {
    "llmproxy": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "llmproxy",
      "options": { "baseURL": "http://127.0.0.1:8080/v1" },
      "models": {
        "claude-opus-4-6": { "name": "Claude Opus 4" }
      }
    }
  },
  "model": "llmproxy/claude-opus-4-6"
}

And set the API key in ~/.local/share/opencode/auth.json (must match LLMPROXY_API_KEY, or any non-empty string if auth is disabled):

{ "llmproxy": { "apiKey": "your-llmproxy-api-key" } }

Container image

Pre-built images are published to ghcr.io/lobstertrap/llmproxy. See the Quick start above for a typical podman run invocation.

To build the image locally instead:

podman build -t llmproxy .

The Containerfile uses BuildKit cache mounts for fast incremental rebuilds.

There is also a Justfile with shortcuts: just container-build, just container-run, just ci, etc. Run just --list for the full set.

Development

cargo test
cargo clippy      # lint
just ci           # fmt-check + clippy + test

License

Apache-2.0. GCP auth module adapted from Goose (also Apache-2.0).

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
crates		crates
.dockerignore		.dockerignore
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Containerfile		Containerfile
Justfile		Justfile
README.md		README.md
llmproxy.example.toml		llmproxy.example.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llmproxy

Goals

AI usage

Related: the `llm` crate

Status

Quick start

Container (GCP Vertex AI)

Local

Building from source

Configuration

GCP Vertex AI authentication

Client authentication

Using with OpenCode

Container image

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llmproxy

Goals

AI usage

Related: the llm crate

Status

Quick start

Container (GCP Vertex AI)

Local

Building from source

Configuration

GCP Vertex AI authentication

Client authentication

Using with OpenCode

Container image

Development

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Related: the `llm` crate

Packages