An OpenAI- and Anthropic-compatible proxy server that routes your requests through multiple free-tier AI providers. If one provider is down or rate-limited, the proxy automatically tries the next one -- so your application keeps working without any code changes.
You configure a list of AI providers (Google Gemini, Qwen, OpenRouter, xAI Grok, Anthropic, or any OpenAI-compatible service), assign priorities, and the proxy handles the rest. Your app talks to one local endpoint, and ai-free-swap finds a working provider behind the scenes. Both OpenAI SDK and Anthropic SDK clients work out of the box.
Deploy your own instance directly from this repo -- no fork needed:
Both platforms auto-generate a SERVER_API_KEY to protect your instance.
Add at least one provider API key (e.g., GEMINI_API_KEY) and you're live.
See Cloud Hosting Guide for more options (Fly.io, VPS, Oracle Cloud free tier).
- How It Works
- Quick Start
- Installation
- Configuration
- Supported Providers
- Using with Applications
- Command-Line Options
- Securing the Proxy
- Troubleshooting
- Running Tests
Your App ai-free-swap Providers
| | |
|-- POST /v1/chat/... -->| |
|-- POST /v1/messages -->| |
| |-- try Gemini (priority 1) ->|
| |<-- error / rate limited -----|
| | |
| |-- try Qwen (priority 2) --->|
| |<-- success ------------------|
| | |
|<-- response -----------| |
- Your application sends a request to the proxy -- either OpenAI format
(
/v1/chat/completions) or Anthropic format (/v1/messages). - The proxy tries providers in priority order (lowest number = highest priority).
- Within the same priority level, providers are tried in random order.
- If a provider fails, the proxy automatically tries the next one.
- The response is returned in the same format the client used -- your app doesn't need to know which provider actually handled the request.
pip install .Sign up for free API keys from one or more providers:
| Provider | Sign Up |
|---|---|
| Google Gemini | https://aistudio.google.com/apikey |
| Alibaba Qwen | https://modelstudio.console.alibabacloud.com (International) https://dashscope.console.aliyun.com/ (Chinese) |
| OpenRouter | https://openrouter.ai/keys |
| xAI Grok | https://console.x.ai/ |
| Anthropic | https://platform.claude.com/settings/workspaces/default/keys/ |
| Groq | https://console.groq.com/keys |
| Mistral | https://admin.mistral.ai/organization/api-keys |
| Cloudflare AI | https://dash.cloudflare.com/?to=/:account/ai/workers-ai |
| Hugging Face | https://huggingface.co/settings/tokens |
| Cohere | https://dashboard.cohere.com/welcome/ |
| Deepseek | https://platform.deepseek.com/api_keys |
| Z.ai | https://z.ai/manage-apikey/apikey-list |
| Moonshot (Kimi) | https://platform.kimi.ai/console/api-keys |
| SiliconFlow | https://cloud.siliconflow.com/account/ak |
| MiniMax | https://platform.minimax.io/user-center/payment/token-plan |
These are just examples -- any service with an OpenAI-compatible API works (DeepSeek, GLM, Groq, Together, local Ollama, etc.). See Custom OpenAI-Compatible Providers.
cp config.yaml.example config.yamlOpen config.yaml and add your API keys. At minimum, you need one provider:
keep_cycles: 1
model_name: "aifree"
server:
host: "0.0.0.0"
port: 8000
providers:
- priority: 1
backends:
- provider: gemini
api_key: "your-gemini-api-key-here"
model: "gemini-2.5-flash"ai-free-swap --config config.yamlThe server starts at http://localhost:8000.
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "aifree",
"messages": [{"role": "user", "content": "Hello!"}]
}'Requires Python 3.11 or later.
pip install .
# Or in development mode
pip install -e .After installation, the ai-free-swap command is available system-wide.
docker build -t ai-free-swap .
docker run -p 8000:8000 \
-v /path/to/your/config.yaml:/app/config.yaml \
-e GEMINI_API_KEY="your-key" \
ai-free-swappip install -r requirements.txt
python -m ai_free_swap --config config.yamlThe proxy is configured with a YAML file. Copy the example to get started:
cp config.yaml.example config.yaml| Setting | Default | Description |
|---|---|---|
keep_cycles |
1 |
How many times to cycle through all providers before giving up. Set to 2 or 3 if providers have intermittent failures. |
model_name |
"aifree" |
The model name shown in /v1/models. Clients can use this name or any backend model name directly. |
show_provider |
true |
When true, responses include a provider_name field showing which provider handled the request. Set to false to hide this. |
model_routing |
"any" |
How to handle the model name from client requests. "any" (default) ignores the client model and uses all providers in priority order -- best for proxy use cases where clients send arbitrary model names. "match" routes to backends whose configured model matches the request, falling back to all providers if no match is found -- useful when you configure multiple distinct models and want clients to choose. |
server:
host: "0.0.0.0" # Listen address ("0.0.0.0" = all interfaces, "127.0.0.1" = local only)
port: 8000 # Listen port
api_key: "" # Optional: require this key from clients (see "Securing the Proxy")Providers are organized into priority groups. Each group has a priority number and a list of backends:
providers:
- priority: 1 # Tried first
backends:
- provider: gemini # Provider type (or any label if base_url is set)
api_key: "..." # API key for this provider
model: "gemini-2.5-flash" # Model to use
- priority: 2 # Tried if all priority 1 backends fail
backends:
- provider: qwen
api_key: "..."
model: "qwen-flash"Each backend supports these fields:
| Field | Required | Description |
|---|---|---|
provider |
Yes | Provider type (see Supported Providers) or any name you choose if base_url is set. |
api_key |
Yes | API key. Supports ${ENV_VAR} syntax. |
model |
Yes | Model identifier to use with this provider. |
name |
No | Friendly name for this backend. Shown in logs and in the provider_name response field. |
base_url |
No | Override the provider's API URL. Required for custom/self-hosted providers. |
extra |
No | Provider-specific options (e.g., timeout, default_max_tokens). |
Instead of putting API keys directly in the config file, you can reference
environment variables with ${VAR_NAME} syntax:
backends:
- provider: gemini
api_key: "${GEMINI_API_KEY}"
model: "gemini-2.5-flash"The variable names are up to you -- use whatever makes sense:
backends:
- provider: deepseek
api_key: "${MY_DEEPSEEK_KEY}"
model: "deepseek-chat"
base_url: "https://api.deepseek.com/v1"Then set the variables before starting:
export GEMINI_API_KEY="your-actual-key"
export MY_DEEPSEEK_KEY="your-deepseek-key"
ai-free-swap --config config.yamlThe priority number determines the order providers are tried:
- Lower numbers are tried first (priority 1 before priority 2).
- Within the same priority, backends are tried in random order -- this distributes load across multiple accounts or keys.
- If all backends in a priority group fail, the proxy moves to the next group.
- After all groups are exhausted, the cycle repeats up to
keep_cyclestimes.
Example: distribute load across three Gemini keys, fall back to Qwen:
keep_cycles: 2 # Try everything twice before giving up
providers:
- priority: 1
backends:
- provider: gemini
name: "gemini-key-1"
api_key: "${GEMINI_KEY_1}"
model: "gemini-2.5-flash"
- provider: gemini
name: "gemini-key-2"
api_key: "${GEMINI_KEY_2}"
model: "gemini-2.5-flash"
- provider: gemini
name: "gemini-key-3"
api_key: "${GEMINI_KEY_3}"
model: "gemini-2.5-flash"
- priority: 2
backends:
- provider: qwen
api_key: "${QWEN_KEY}"
model: "qwen-flash"Any service with an OpenAI-compatible API works. Set base_url and use
whatever name you want for provider -- the name is just a label for logs:
providers:
- priority: 1
backends:
# DeepSeek
- provider: deepseek
api_key: "${DEEPSEEK_KEY}"
model: "deepseek-chat"
base_url: "https://api.deepseek.com/v1"
# GLM (Zhipu AI)
- provider: glm
api_key: "${GLM_KEY}"
model: "glm-4-flash"
base_url: "https://open.bigmodel.cn/api/paas/v4"
# Groq
- provider: groq
api_key: "${GROQ_KEY}"
model: "llama-3.3-70b-versatile"
base_url: "https://api.groq.com/openai/v1"
# Together AI
- provider: together
api_key: "${TOGETHER_KEY}"
model: "meta-llama/Llama-3.3-70B-Instruct-Turbo"
base_url: "https://api.together.xyz/v1"
# Local Ollama
- provider: ollama
api_key: "unused"
model: "llama3.2"
base_url: "http://localhost:11434/v1"
# LM Studio
- provider: lmstudio
api_key: "unused"
model: "local-model"
base_url: "http://localhost:1234/v1"keep_cycles: 1
model_name: "aifree"
show_provider: true
model_routing: "any" # "any" = ignore client model, "match" = route by model name
server:
host: "0.0.0.0"
port: 8000
api_key: ""
providers:
- priority: 1
backends:
- provider: gemini
name: "gemini-flash-1"
api_key: "${GEMINI_API_KEY_1}"
model: "gemini-2.5-flash"
- provider: gemini
name: "gemini-flash-2"
api_key: "${GEMINI_API_KEY_2}"
model: "gemini-2.5-flash"
- priority: 2
backends:
- provider: qwen
api_key: "${DASHSCOPE_API_KEY}"
model: "qwen-flash"
- priority: 3
backends:
- provider: openrouter
api_key: "${OPENROUTER_API_KEY}"
model: "google/gemini-2.5-flash:free"
- provider: openrouter
api_key: "${OPENROUTER_API_KEY}"
model: "meta-llama/llama-4-scout:free"
- priority: 4
backends:
- provider: grok
api_key: "${XAI_API_KEY}"
model: "grok-3-mini"
- priority: 5
backends:
- provider: anthropic
api_key: "${ANTHROPIC_API_KEY}"
model: "claude-sonnet-4-6"These providers have built-in base URLs and work with just an API key:
| Provider | provider value |
Models (examples) |
|---|---|---|
| Google Gemini | gemini |
gemini-2.5-flash, gemini-2.5-flash-lite |
| Alibaba Qwen | qwen |
qwen-flash |
| OpenRouter | openrouter |
google/gemini-2.5-flash:free, meta-llama/llama-4-scout:free |
| xAI Grok | grok |
grok-3-mini |
| OpenAI | openai |
gpt-4o-mini, gpt-4o |
| Anthropic | anthropic |
claude-sonnet-4-6, claude-haiku-4-5 |
Any other OpenAI-compatible service works too -- just set base_url.
See Custom OpenAI-Compatible Providers
for examples with DeepSeek, GLM, Groq, Together, Ollama, LM Studio, and more.
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="unused", # or your proxy api_key if set
)
response = client.chat.completions.create(
model="aifree",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)With streaming:
stream = client.chat.completions.create(
model="aifree",
messages=[{"role": "user", "content": "Tell me a story."}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")from anthropic import Anthropic
client = Anthropic(
base_url="http://localhost:8000",
api_key="unused", # or your proxy api_key if set
)
response = client.messages.create(
model="aifree",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.content[0].text)With streaming:
with client.messages.stream(
model="aifree",
max_tokens=1024,
messages=[{"role": "user", "content": "Tell me a story."}],
) as stream:
for text in stream.text_stream:
print(text, end="")Using ANTHROPIC_BASE_URL environment variable:
Many tools that use the Anthropic SDK (Claude Code, aider, etc.) read the
ANTHROPIC_BASE_URL environment variable. Set it to point at your proxy:
export ANTHROPIC_BASE_URL="http://localhost:8000"
export ANTHROPIC_API_KEY="your-proxy-key" # or any non-empty string if proxy has no api_key
# Now any tool using the Anthropic SDK will go through your proxy
claude # Claude Code
aider # aider with --model claude-sonnet-4-6# Non-streaming (OpenAI format)
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "aifree", "messages": [{"role": "user", "content": "Hi"}]}'
# Streaming (OpenAI format)
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "aifree", "messages": [{"role": "user", "content": "Hi"}], "stream": true}'
# Non-streaming (Anthropic format)
curl http://localhost:8000/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: your-proxy-key" \
-H "anthropic-version: 2023-06-01" \
-d '{"model": "aifree", "max_tokens": 1024, "messages": [{"role": "user", "content": "Hi"}]}'
# With proxy authentication (OpenAI style)
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-proxy-key" \
-d '{"model": "aifree", "messages": [{"role": "user", "content": "Hi"}]}'ai-free-swap works as a drop-in replacement for the OpenAI API. In most tools, you just need to change two settings:
- API Base URL:
http://localhost:8000/v1 - Model:
aifree(or any backend model name you configured)
This works with tools like aider, cline, open-hands, continue, LangChain, LlamaIndex, Open Interpreter, and any other application that supports custom OpenAI endpoints.
ai-free-swap also works as a drop-in replacement for the Anthropic Messages API. Set the base URL to your proxy:
ANTHROPIC_BASE_URL:http://localhost:8000ANTHROPIC_API_KEY: your proxyapi_key(or any non-empty string)
This works with Claude Code, aider (Anthropic mode), and any other tool that uses the Anthropic SDK with a configurable base URL.
ai-free-swap [options]
Options:
--config, -c PATH Path to config file (default: config.yaml)
--host HOST Override the host from config
--port PORT Override the port from config
--log-level LEVEL Set log verbosity: debug, info, warning, error
(default: info)
Examples:
# Use a custom config and port
ai-free-swap -c my-config.yaml --port 9000
# Enable debug logging to see provider routing decisions
ai-free-swap --log-level debug
# Run as a Python module
python -m ai_free_swap --config config.yamlIf the proxy is accessible on a network (not just localhost), set an API key:
server:
api_key: "your-secret-proxy-key"Clients must then include this key in requests. Both authentication methods are supported:
# OpenAI-style: Authorization header
curl http://your-server:8000/v1/chat/completions \
-H "Authorization: Bearer your-secret-proxy-key" \
-H "Content-Type: application/json" \
-d '{"model": "aifree", "messages": [{"role": "user", "content": "Hi"}]}'
# Anthropic-style: x-api-key header
curl http://your-server:8000/v1/messages \
-H "x-api-key: your-secret-proxy-key" \
-H "Content-Type: application/json" \
-d '{"model": "aifree", "max_tokens": 1024, "messages": [{"role": "user", "content": "Hi"}]}'The /health endpoint is always public (no key required).
"Error loading config" -- Check that config.yaml exists and is valid
YAML. If using ${ENV_VAR} syntax, make sure the environment variables are set.
"All configured providers failed" (503) -- Check that your API keys are
valid. Run with --log-level debug to see which providers were tried and why
each failed. Increase keep_cycles to retry more times.
"Model 'xyz' is not configured" (400) -- The model name in your request
doesn't match any configured backend. Use "aifree" to use any available
provider, or check your config for the exact model names.
Server not reachable -- Check the port isn't already in use. In Docker,
make sure you used -p 8000:8000. If host is 127.0.0.1, the server only
accepts local connections -- change to 0.0.0.0.
- API Reference -- full endpoint documentation, request/response formats, streaming protocol, error codes
- Cloud Hosting Guide -- deploy to Render, Railway, Fly.io, or any VPS with step-by-step instructions
# Install test dependencies
pip install -e ".[test]"
# Run all tests
python -m pytest tests/ -v
# Run a specific test file
python -m pytest tests/test_router.py -v