The spirit of jazz is the spirit of openness.
— Herbie Hancock, on software licensing
I’ll play it first and tell you what it is later.
— Miles Davis, on vibe-coding
This repository collects together a complete "open AI stack" -- everything you need to run a smart language model and the interfaces that help it complete useful tasks. It uses Modal.
The language model is DeepSeek's V4 Pro.
It is run using:
- Nvidia B200 GPUs
- The Modal cloud deployment platform (project sponsor)
- The SGLang inference server
- The OpenAI-compatible API interface (based on
/chat/completions).
To speed up the model weight downloading process, you'll need to add a Hugging Face access token stored as a Modal Secret.
For a single user, this achieves >150 tok/s output.
OpenCode is a terminal user interface for connecting human users, language models, and computer terminals, akin to Anthropic's Claude Code but with broader LLM API support.
We provide instructions for integrating the self-hosted LLM with OpenCode and for deploying OpenCode servers on Modal here
OpenClaw is an agentic assistant system designed for maximum integrability.
We provide instructions for integrating the self-hosted LLM with OpenClaw here.
The Vercel AI SDK offers both Core and UI sub SDKs for integrating JavaScript applications with LLMs.
We demonstrate a simple integration of this stack with the self-hosted LLM -- both a "hello world"-level integration with a NodeJS CLI here and a proper NextJS app here.
It is deployed here.
We like the llm CLI tool from Simon Willison
for running quick LLM queries from the terminal.
It offers integration with OpenAI-compatible API providers, like our self-hosted LLM, via the same interface as OpenAI's models. Docs are here.
We demonstrate a small plugin in llm_show_reasoning
that prints the LLM's reasoning output -- not available from OpenAI reasoning models,
but available for open models. This reduces apparent latency.