An OpenAI-compatible API adapter for Google Vertex AI Gemini models, deployed on Cloudflare Workers. Provides a drop-in replacement endpoint so any tool or app built for the OpenAI API can use Gemini models seamlessly.
- OpenAI-Compatible Endpoints: Standard
/v1/chat/completionsand/v1/modelsendpoints - Streaming & Non-Streaming: Full SSE streaming support with
TransformStream - Multiple Auth Methods: Vertex AI Express API Key and/or Service Account JSON
- Official Vertex Routes: Express keys use
publishers/google/models/...:generateContent; service accounts use Vertex AI's OpenAI-compatible endpoint - Multi-Key Rotation: Round-robin rotation when multiple API keys are configured
- Thinking/Reasoning: Extracts and surfaces
reasoning_contentfrom Gemini 2.5+ / 3.x models - Tool/Function Calling: Full support for OpenAI-style function calling
- Image Generation: Support for
-2kand-4kimage generation model variants (see Known Limitations) - Grounded Search: Use
-searchsuffix for Google Search grounding - Zero Dependencies: Uses only Web APIs (
fetch,TransformStream,Web Crypto) - Edge Deployment: Runs on Cloudflare's global edge network for low latency
- One-Click Deploy: Deploy button for instant setup
Click the Deploy to Cloudflare button above, or deploy manually:
git clone https://github.com/workHMZ/vertex2openai-cf.git
cd vertex2openai-cf
npm install# Required: API key to protect your adapter
npx wrangler secret put API_KEY
# Required (choose one):
# Option A: Vertex AI Express API Key (recommended, simplest)
npx wrangler secret put VERTEX_EXPRESS_API_KEY
# Alias also supported: VERTEX_API_KEY
# Option B: Service Account JSON (for GCP project-based auth)
npx wrangler secret put GOOGLE_CREDENTIALS_JSONnpx wrangler deployIn Cloudflare Dashboard → Workers & Pages → your worker → Settings → Domains & Routes → Add Custom Domain.
| Name | Required | Description |
|---|---|---|
API_KEY |
✅ | API key to protect this adapter |
VERTEX_EXPRESS_API_KEY / VERTEX_API_KEY |
Vertex AI Express API Key(s), comma-separated | |
GOOGLE_CREDENTIALS_JSON |
Service Account JSON content(s) |
| Name | Default | Description |
|---|---|---|
GCP_LOCATION |
global |
GCP region/location |
GCP_PROJECT_ID |
auto-detect | Explicit GCP project ID |
MODELS_CONFIG |
built-in | Custom model list JSON |
All requests require a Bearer token matching your configured API_KEY:
Authorization: Bearer YOUR_API_KEY
| Method | Path | Description |
|---|---|---|
GET |
/ |
Health check |
GET |
/v1/models |
List available models |
POST |
/v1/chat/completions |
Chat completions |
curl -X POST https://your-worker.workers.dev/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "gemini-3.1-pro-preview",
"messages": [
{"role": "user", "content": "Hello, what is 2+2?"}
]
}'curl -X POST https://your-worker.workers.dev/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "gemini-3.1-pro-preview",
"messages": [
{"role": "user", "content": "Write a short poem about coding."}
],
"stream": true
}'| Suffix | Description |
|---|---|
| (none) | Standard model call |
-openai |
Explicit OpenAI-compatible endpoint (Service Account / [PAY] models only) |
-openaisearch |
OpenAI-compatible endpoint with web search (Service Account / [PAY] models only) |
-search |
Google Search grounding |
-nothinking |
Lower thinking budget/level where supported |
-max |
Highest thinking budget/level where supported |
-2k |
Image generation at 2K resolution ( |
-4k |
Image generation at 4K resolution ( |
Models are prefixed with [EXPRESS] or [PAY] based on auth method. If you call an unprefixed model and both auth methods are configured, the Worker prefers Express mode. Use [PAY] to force the Service Account path.
Image generation models (e.g., gemini-3.1-flash-image-preview, -2k, -4k variants) are very likely to fail on the Cloudflare Workers Free plan due to the 10ms CPU time limit.
Image model responses contain large base64-encoded image data. Processing (JSON parse + format conversion) of such payloads typically consumes 50–130ms of CPU time, far exceeding the free tier's 10ms cap. When the limit is hit, Cloudflare terminates the Worker, resulting in a RangeError or exceededCpu error.
Solutions:
| Option | Details |
|---|---|
| Upgrade to Workers Paid Plan | $5/month with the Unbound usage model gives up to 30s CPU time per invocation. This fully resolves the issue. |
| Self-host / Local deployment | Run the adapter locally or on a VM (Docker, Node.js) where there are no CPU time constraints. See the original vertex2openai project for a Docker-based alternative. |
Note: Standard text-only models (e.g.,
gemini-3.1-flash,gemini-3.1-pro-preview) work fine on the free plan — their responses are small enough to stay within the CPU limit.
Client (OpenAI SDK/App)
│
▼
┌──────────────────────┐
│ Cloudflare Worker │
│ ┌────────────────┐ │
│ │ Auth Middleware │ │
│ └───────┬────────┘ │
│ ┌───────▼────────┐ │
│ │ Request Convert│ │
│ └───────┬────────┘ │
│ ┌───────▼────────┐ │
│ │ Vertex AI Call │──┼──► Vertex AI Express or OpenAI-compat endpoint
│ └───────┬────────┘ │
│ ┌───────▼────────┐ │
│ │Response Convert│ │
│ └───────┬────────┘ │
│ │ │
└──────────┼───────────┘
▼
Client receives
OpenAI-format response
# Create .dev.vars for local secrets
cat > .dev.vars << 'EOF'
API_KEY=test123
VERTEX_EXPRESS_API_KEY=your_express_key_here
EOF
# Start dev server
npm run dev
# Test health check
curl http://localhost:8787/
# Test models
curl -H "Authorization: Bearer test123" http://localhost:8787/v1/models
# Test chat
curl -X POST http://localhost:8787/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer test123" \
-d '{"model":"gemini-3.1-pro-preview","messages":[{"role":"user","content":"Hi!"}]}'This project is inspired by and references vertex2openai by gzzhongqi — a Python/Docker-based OpenAI-to-Gemini adapter. This TypeScript/Cloudflare Workers version is a ground-up rewrite optimized for edge deployment with zero runtime dependencies.
MIT — see LICENSE for details.