Backend REST API reference for Bonsai Desk 0.3.0.
http://127.0.0.1:8000/api
Override the frontend target with VITE_API_BASE_URL.
| Endpoint | Method | Description |
|---|---|---|
/runtime/overview |
GET | Runtime status, config, models, install progress, source metadata, and diagnostics |
/runtime/status |
GET | Runtime status only |
/runtime/config |
GET / PUT | Read or update persisted runtime settings |
/runtime/install |
POST | Download the official Prism runtime and selected Bonsai model |
/runtime/install-progress |
GET | Poll background installation progress |
/runtime/diagnostics |
GET | Structured runtime checks, GPU info, runtime version, and recent logs |
/runtime/browse-binary |
POST | Open a native Windows picker for llama-server.exe |
/runtime/browse-model |
POST | Open a native Windows picker for .gguf models |
/runtime/use-existing-assets |
POST | Persist user-selected local runtime/model paths |
/runtime/start |
POST | Start llama-server |
/runtime/stop |
POST | Stop llama-server |
/runtime/restart |
POST | Restart llama-server |
/runtime/logs |
GET | Return recent runtime logs |
/models |
GET | List Bonsai variants and custom linked model state |
/models/select |
POST | Switch active Bonsai variant |
/models/install |
POST | Install a specific Bonsai variant |
/conversations |
GET / POST | List or create conversations |
/conversations/{id} |
GET / PATCH / DELETE | Read, rename, or delete one conversation |
/chat/stream |
POST | Stream a chat response as SSE |
GET /health is available outside /api for a simple backend health check.
Returns the full bootstrap payload used by the frontend.
{
"status": {
"installed": true,
"running": false,
"ready": false,
"pid": null,
"host": "127.0.0.1",
"port": 8080,
"model_loaded": "Bonsai-8B.gguf",
"message": "Runtime is stopped.",
"install_message": "Runtime and model are available."
},
"config": {
"temperature": 0.6,
"top_k": 40,
"top_p": 0.92,
"min_p": 0.05,
"max_tokens": 1024,
"ctx_size": 65536,
"gpu_layers": 99,
"threads": 8,
"batch_size": 2048,
"system_prompt": "You are a helpful local AI assistant powered by Bonsai.",
"reasoning_budget": -1,
"reasoning_format": "parsed",
"enable_thinking": true,
"model_filename": "Bonsai-8B.gguf",
"model_variant": "8B",
"runtime_binary_path": "",
"model_file_path": ""
},
"models": [
{
"id": "prism-ml/Bonsai-8B-gguf",
"name": "Bonsai 8B",
"filename": "Bonsai-8B.gguf",
"size_hint": "1.16 GB",
"local_path": "E:\\OpenCode-DEV\\Prism Launcher\\.bonsai-desk\\models\\Bonsai-8B.gguf",
"installed": true,
"variant": "8B",
"download_url": "https://huggingface.co/prism-ml/Bonsai-8B-gguf/resolve/main/Bonsai-8B.gguf?download=true",
"requirements_hint": "Recommended for RTX-class GPUs and higher-quality local reasoning.",
"is_active": true,
"is_downloaded": true
}
],
"install_progress": {
"state": "completed",
"current_step": "Already installed",
"detail": "Runtime and model are available.",
"progress": 100,
"runtime_progress": 100,
"model_progress": 100,
"error": null,
"started_at": null,
"updated_at": null
},
"sources": [
{
"kind": "Model",
"name": "Bonsai 8B GGUF",
"license_name": "Apache-2.0",
"source_url": "https://huggingface.co/prism-ml/Bonsai-8B-gguf"
}
],
"diagnostics": {
"platform_label": "Windows",
"gpu_label": "NVIDIA GPU detected",
"cuda_label": "CUDA available",
"runtime_version": "llama-server --version output",
"checks": [
{
"key": "runtime_binary",
"label": "Runtime binary",
"state": "ok",
"detail": "Resolved managed Prism runtime.",
"action": ""
}
],
"recent_logs": []
}
}Returns the persisted RuntimeConfig.
Persists runtime and inference settings.
{
"temperature": 0.6,
"top_k": 40,
"top_p": 0.92,
"min_p": 0.05,
"max_tokens": 1024,
"ctx_size": 65536,
"gpu_layers": 99,
"threads": 8,
"batch_size": 2048,
"system_prompt": "You are a helpful local AI assistant powered by Bonsai.",
"reasoning_budget": -1,
"reasoning_format": "parsed",
"enable_thinking": true,
"model_filename": "Bonsai-8B.gguf",
"model_variant": "8B",
"runtime_binary_path": "",
"model_file_path": ""
}reasoning_budget is normalized to -1 (unrestricted thinking) or 0 (thinking disabled).
Returns all Bonsai variants and marks which one is active and downloaded.
{
"variant": "4B"
}{
"variant": "1.7B"
}{
"runtime_binary_path": "D:\\Tools\\Prism\\llama-server.exe",
"model_file_path": "D:\\Models\\Bonsai-4B.gguf"
}Send null or "" to clear one of these linked paths.
Returns a list of conversation summaries.
{
"title": "New Chat"
}Returns one conversation with its messages.
{
"title": "Renamed Chat"
}{
"deleted": true
}{
"conversation_id": "conversation-uuid",
"prompt": "Explain what Bonsai Desk does."
}Response is streamed as newline-separated SSE data: events containing JSON objects with a type field:
data: {"type":"token","content":"Hello","conversation_id":"conversation-uuid"}
data: {"type":"token","content":" there","conversation_id":"conversation-uuid"}
data: {"type":"done","content":"","conversation_id":"conversation-uuid"}
If the upstream runtime fails after streaming starts, the backend sends a terminal error event and the frontend rejects the stream.
Request-time failures use FastAPI's default JSON shape:
{
"detail": "Conversation not found."
}