You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The fastest local AI engine for Apple Silicon. 4.2x faster than Ollama, 0.08s cached TTFT, 100% tool calling. 17 tool parsers, prompt cache, reasoning separation, cloud routing. Drop-in OpenAI replacement. Works with Claude Code, Cursor, Aider.
Auto-tuned launcher for GGUF models on llama.cpp / ik_llama.cpp — OpenAI-compatible server with multi-GPU tensor-split, MoE expert placement, measured flag tuning (AI Tune), hardware-matched HuggingFace downloads, and crash recovery. An Ollama alternative for multi-GPU rigs.
A private Claude-Code-style coding agent for Apple Silicon — run chat, code, and local model workflows on-device. MLX-native, Ollama/OpenAI API compatible, zero API keys.
🤖🗜️⚡️ Local LLM server for Apple Silicon. 5.4× faster end-to-end on long contexts vs Ollama, 33% less RAM, INT3 support for Qwen3. OpenAI + Ollama drop-in. Built for repeated long-context workloads on memory-constrained Macs.
Local OpenAI-compatible proxy with real failover, multi-account aliasing, and ChatGPT Plus/Pro as a backend. Single ~11MB binary, no Docker, secrets in OS keyring. Windows/macOS/Linux.
In the style of Claude Chat Pro — fully local on Apple Silicon. oMLX (vision + speed) + Open Interpreter (unrestricted sandbox) + rich Artifacts + attachments (PDF, JSON, Markdown, PNG, JPEG) + paste support.