ollama-alternative

Here are 16 public repositories matching this topic...

raullenchai / Rapid-MLX

The fastest local AI engine for Apple Silicon. 4.2x faster than Ollama, 0.08s cached TTFT, 100% tool calling. 17 tool parsers, prompt cache, reasoning separation, cloud routing. Drop-in OpenAI replacement. Works with Claude Code, Cursor, Aider.

python macos inference cursor hacktoberfest m2 m3 mlx m1 fastapi apple-silicon openai-api llm local-llm qwen deepseek tool-calling claude-code ollama-alternative

Updated Jun 12, 2026
Python

raketenkater / llm-server

Star

Auto-tuned launcher for GGUF models on llama.cpp / ik_llama.cpp — OpenAI-compatible server with multi-GPU tensor-split, MoE expert placement, measured flag tuning (AI Tune), hardware-matched HuggingFace downloads, and crash recovery. An Ollama alternative for multi-GPU rigs.

golang metal vulkan cuda self-hosted moe inference-server multi-gpu rocm openai-api llm llamacpp llama-cpp local-llm gguf speculative-decoding localllama ollama-alternative

Updated Jun 12, 2026
Go

ovoment / ovo-local-llm

Star

A private Claude-Code-style coding agent for Apple Silicon — run chat, code, and local model workflows on-device. MLX-native, Ollama/OpenAI API compatible, zero API keys.

Updated Apr 22, 2026
TypeScript

snapllm / snapllm

Star

🔥 🔥 Alternative to Ollama 🔥 🔥 multi-model <1ms LLM switching

multi-model multimodal llm llm-serving llm-inference llm-tools ollama-alternative

Updated Feb 15, 2026
C++

konjoai / squish

Star

🤖🗜️⚡️ Local LLM server for Apple Silicon. 5.4× faster end-to-end on long contexts vs Ollama, 33% less RAM, INT3 support for Qwen3. OpenAI + Ollama drop-in. Built for repeated long-context workloads on memory-constrained Macs.

Updated Jun 11, 2026
Python

xschahl / vAquila

Star

The Ollama developer experience with the vLLM production power. Deploy local LLMs via Docker with smart and automatic GPU VRAM management.

python docker cli production inference self-hosted nvidia self-hosting mlops gpu-management huggingface llm vllm ollama-alternative

Updated Mar 12, 2026
Python

Chevron7Locked / lm-chat

Sponsor

Star

Web frontend for LM Studio — browser access, adaptive memory, multi-user auth, MCP tools

Updated Jun 6, 2026
Python

styles01 / flow-llm

Star

Local LLM gateway for Apple Silicon. Works with OpenClaw, Hermes Agent, Claude Code, and Codex (AIRun). No Ollama or LM Studio required.

hermes mlx apple-silicon ai-agent llm llama-cpp local-llm vibe-coding claude-code ollama-alternative openclaw hermes-agent

Updated Jun 2, 2026
Python

AlexC1991 / AI_GUI

Star

A unified local AI workspace — chat with GGUF models (Qwen, Llama, Phi), generate images, and share your AI remotely. Features agentic web search.

python image-generation pyside6 llm agentic local-ai agentic-ai ollama-alternative

Updated Mar 7, 2026
Python

metiu1 / Vortelio

Star

Local AI platform: run LLMs, Stable Diffusion, Whisper, TTS, video & 3D generation offline — open-source Ollama + ComfyUI alternative in one binary.

Updated Jun 10, 2026
Go

aralde / operatorlm

Star

Local OpenAI-compatible proxy with real failover, multi-account aliasing, and ChatGPT Plus/Pro as a backend. Single ~11MB binary, no Docker, secrets in OS keyring. Windows/macOS/Linux.

Updated Jun 2, 2026
Go

fxops-ai / omlx-interpreter

Star

In the style of Claude Chat Pro — fully local on Apple Silicon. oMLX (vision + speed) + Open Interpreter (unrestricted sandbox) + rich Artifacts + attachments (PDF, JSON, Markdown, PNG, JPEG) + paste support.

react streaming typescript vision mlx code-execution multimodal fastapi apple-silicon local-llm local-ai open-interpreter claude-alternative ollama-alternative ai-sandbox open-webui-alternative

Updated May 3, 2026
TypeScript

lawcontinue / hippo

Star

Local LLM inference + embedding & search in one package. Run 30B on consumer hardware, RAG without ChromaDB.

sqlite embedding bm25 rag vector-search pipeline-parallelism apple-silicon llm local-inference ollama-alternative

Updated Jun 13, 2026
Python

ra-yavuz / hydra-llm

Star

Run local LLMs the easy way. CLI + KDE Plasma 6 widget. Wraps llama.cpp in Docker. No telemetry.

linux docker cli privacy kde plasma embeddings rag vector-database llm llama-cpp local-llm retrieval-augmented-generation lancedb gguf local-rag openai-compatible ollama-alternative

Updated Jun 9, 2026
Python

Doublefaced-flavoursomeness591 / operatorlm

Star

Route LLM API requests through a local proxy with multi-account support, failover, and secure key storage.

go cli proxy self-hosted gemini tray developer-tools failover codex openai-api azure-openai llm chatgpt openrouter ollama openai-compatible ollama-alternative

Updated Jun 13, 2026
Go

Harmpleomorphism9956 / ovo-local-llm

Star

Run large language models locally on your machine with this efficient deployment tool.

macos rust llama gemma mistral mlx privacy-first huggingface on-device-ai apple-silicon diffusers local-llm qwen deepseek claude-code ollama-alternative agentic-ide lm-studio-alternative

Updated Jun 13, 2026

Improve this page

Add a description, image, and links to the ollama-alternative topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ollama-alternative topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ollama-alternative

Here are 16 public repositories matching this topic...

raullenchai / Rapid-MLX

raketenkater / llm-server

ovoment / ovo-local-llm

snapllm / snapllm

konjoai / squish

xschahl / vAquila

Chevron7Locked / lm-chat

styles01 / flow-llm

AlexC1991 / AI_GUI

metiu1 / Vortelio

aralde / operatorlm

fxops-ai / omlx-interpreter

lawcontinue / hippo

ra-yavuz / hydra-llm

Doublefaced-flavoursomeness591 / operatorlm

Harmpleomorphism9956 / ovo-local-llm

Improve this page

Add this topic to your repo