Z.ai to OpenAI-Compatible API Proxy with Tool Calling Support.
Convert Z.ai's free web chat into a fully OpenAI-compatible API endpoint, enabling coding agents like Cline, Roo-Code, and OpenCode to use GLM models with function/tool calling.
- OpenAI-Compatible API - Drop-in replacement for OpenAI's
/v1/chat/completions - Tool Calling Support - Emulated function calling via XML injection + multi-format parser
- CAPTCHA Bypass - Playwright-based browser automation bypasses Aliyun invisible CAPTCHA
- Streaming Support - SSE streaming responses (chunked)
- Thinking Models - Supports reasoning_content for GLM thinking models
- Web Dashboard - Monitor status and test API from browser
- Multiple Models - GLM-4-Flash, GLM-4.5, GLM-5, GLM-5.1, and more
- Node.js 18+
- A Z.ai account (free at chat.z.ai)
- Clone and install:
git clone https://github.com/toan9tranphu/GLM-API.git
cd GLM-API
npm install
npx playwright install chromium-
Get your Z.ai JWT token:
- Log in to chat.z.ai
- Open browser DevTools → Application → Cookies →
token - Copy the JWT value
-
Start the server:
ZAI_TOKEN=your_jwt_token_here API_KEY=sk-your-key node server.jsIn Cline settings:
- API Provider: OpenAI Compatible
- Base URL:
http://localhost:8000/v1 - API Key:
sk-your-key(same asAPI_KEYenv var) - Model:
glm-4-flash(or any supported model)
| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Health check (no auth) |
| GET | /v1/models |
List available models |
| POST | /v1/chat/completions |
Chat completion (OpenAI format) |
| GET | / |
Web dashboard |
glm-4-flash(fastest, free)glm-4.5,glm-4.5-airglm-4.6,glm-4.6-vglm-4.7glm-5,glm-5-turboglm-5.1
Add -thinking suffix for reasoning models (e.g., glm-5.1-thinking).
GLM-API uses emulated tool calling: tool definitions are injected into the system prompt as XML, and the model's XML response is parsed into OpenAI tool_calls format. This works reliably with GLM models.
curl http://localhost:8000/v1/chat/completions \
-H "Authorization: Bearer sk-your-key" \
-H "Content-Type: application/json" \
-d '{
"model": "glm-4-flash",
"messages": [{"role": "user", "content": "What is the weather in Tokyo?"}],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"]
}
}
}],
"stream": false
}'{
"id": "chatcmpl-...",
"object": "chat.completion",
"choices": [{
"message": {
"role": "assistant",
"content": null,
"tool_calls": [{
"id": "call_1",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\":\"Tokyo\"}"
}
}]
},
"finish_reason": "tool_calls"
}]
}| Variable | Default | Description |
|---|---|---|
ZAI_TOKEN |
(required) | Z.ai JWT token cookie |
API_KEY |
sk-glm-api |
API key for authentication |
PORT |
8000 |
Server port |
HEADLESS |
true |
Run browser headless |
TIMEOUT |
120000 |
Request timeout (ms) |
Cline/Roo-Code → GLM-API (Express) → Playwright → chat.z.ai
↓
Parse SSE Response
↓
Extract Tool Calls
↓
OpenAI-Compatible Response
- Browser: Playwright controls a real Chromium browser that bypasses Z.ai's CAPTCHA
- SSE Capture: Intercepts
chat.z.ai/api/v2/chat/completionsresponses - Parser: Multi-format tool call parser (XML, JSON, inline, natural language)
- Adapter: Converts Z.ai's custom SSE format to OpenAI format
- Sequential only: One request at a time (browser is single-session)
- ~10-30s latency: Depends on model and response length
- Token expiry: JWT tokens expire; re-login needed periodically
- Emulated tool calling: Not native; depends on model following XML format
MIT