Skip to content

LobsterTrap/llmproxy

Repository files navigation

llmproxy

A lightweight HTTP proxy that accepts OpenAI-format API requests and routes them to multiple backends: OpenAI, GCP Vertex AI (Gemini and Claude), and OpenResponses-compatible servers.

Model routing is glob-based (first match wins), so a single proxy instance can fan out gpt-* to OpenAI, claude-* to Vertex AI, and everything else to a local OpenResponses server.

Goals

llmproxy aims to be a much lighter alternative to LiteLLM. It only maps the OpenAI API (Chat Completions) and OpenResponses API to proprietary backends — there is no cost management, rate limiting, key vaulting, or other platform features. The scope is deliberately narrow: translate request/response formats and route by model glob.

The primary intended use case is agentic sandboxing systems; by running an llmproxy instance in a separate container, we can avoid leaking credentials into the agent container.

llmproxy is written in Rust for correctness and efficiency. It is also designed to be usable as a Rust crate (use llmproxy::...), so projects like Goose can embed the proxy in-process rather than running it as a sidecar.

AI usage

This crate is largely generated by Opus 4.6, but much of the code has been reviewed.

Related: the llm crate

The llm crate is a Rust client SDK that provides a unified API for calling multiple LLM providers. It covers more backends (12+) and exposes custom Rust types for messages, tool calls, etc.

llmproxy takes a different approach: instead of a bespoke Rust API, it uses the OpenAI Responses API as its canonical format — both over the network and as an in-process crate interface. A project like Goose could embed llmproxy as a library and talk Responses API internally, getting multi-provider routing without adopting a non-standard type system. The trade-off is that llmproxy doesn't abstract away the wire format — you're always working with (typed, serde-backed) API types rather than a higher-level SDK.

Both projects handle Anthropic format translation and support streaming. llmproxy's #[serde(flatten)] passthrough pattern preserves unknown API fields, which matters for a proxy but not for a client SDK.

Status

Lightly tested with the use case of OpenAI (via OpenCode) talking to GCP vertex.

Quick start

For a single backend (the common case), no config file is needed — just set the backend via CLI flags or environment variables.

Container (GCP Vertex AI)

podman run --rm -p 8080:8080 \
  -v $HOME/.config/gcloud:/gcp-creds:ro \
  -e GOOGLE_APPLICATION_CREDENTIALS=/gcp-creds/application_default_credentials.json \
  -e LLMPROXY_BACKEND=gcp-vertex \
  -e GOOGLE_CLOUD_PROJECT=my-project \
  -e VERTEX_LOCATION=us-east5 \
  ghcr.io/cgwalters/llmproxy

Local

llmproxy --backend gcp-vertex --project my-project --location us-east5

For multi-backend setups, use a config file.

The proxy listens on 0.0.0.0:8080 by default (override with -l / --listen or in the config file).

Building from source

git clone https://github.com/cgwalters/llmproxy
cd llmproxy
cargo build --release
./target/release/llmproxy --backend gcp-vertex --project my-project --location us-east5

Configuration

See llmproxy.example.toml for the full format. The key ideas:

  • [server] sets the listen address.
  • [[backends]] defines one or more backends, each with a kind (openai, gcp-vertex, or openresponses), credentials, and a models list of glob patterns. The first backend whose pattern matches the requested model wins.

GCP Vertex AI authentication

The gcp-vertex backend authenticates via Application Default Credentials (ADC), checked in this order:

  1. GOOGLE_APPLICATION_CREDENTIALS env var (service account JSON key)
  2. ~/.config/gcloud/application_default_credentials.json (from gcloud auth application-default login)
  3. GCE metadata server

Tokens are cached and refreshed automatically.

Client authentication

Set the LLMPROXY_API_KEY environment variable to require clients to authenticate with Authorization: Bearer <key>. When the variable is unset, all requests are accepted without auth.

The /healthz endpoint is always unauthenticated.

Using with OpenCode

Point OpenCode at the proxy by adding a provider to your config (~/.config/opencode/config.json):

{
  "provider": {
    "llmproxy": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "llmproxy",
      "options": { "baseURL": "http://127.0.0.1:8080/v1" },
      "models": {
        "claude-opus-4-6": { "name": "Claude Opus 4" }
      }
    }
  },
  "model": "llmproxy/claude-opus-4-6"
}

And set the API key in ~/.local/share/opencode/auth.json (must match LLMPROXY_API_KEY, or any non-empty string if auth is disabled):

{ "llmproxy": { "apiKey": "your-llmproxy-api-key" } }

Container image

Pre-built images are published to ghcr.io/lobstertrap/llmproxy. See the Quick start above for a typical podman run invocation.

To build the image locally instead:

podman build -t llmproxy .

The Containerfile uses BuildKit cache mounts for fast incremental rebuilds.

There is also a Justfile with shortcuts: just container-build, just container-run, just ci, etc. Run just --list for the full set.

Development

cargo test
cargo clippy      # lint
just ci           # fmt-check + clippy + test

License

Apache-2.0. GCP auth module adapted from Goose (also Apache-2.0).

About

Lightweight OpenResponses proxy (and Rust crate) for sandboxed agents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages