Skip to content

modal-projects/modal-jazz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🎷 Modal Jazz

The spirit of jazz is the spirit of openness.

— Herbie Hancock, on software licensing

I’ll play it first and tell you what it is later.

— Miles Davis, on vibe-coding

This repository collects together a complete "open AI stack" -- everything you need to run a smart language model and the interfaces that help it complete useful tasks. It uses Modal.

Open Language Modeling Backend

The language model is DeepSeek's V4 Pro.

It is run using:

  • Nvidia B200 GPUs
  • The Modal cloud deployment platform (project sponsor)
  • The SGLang inference server
  • The OpenAI-compatible API interface (based on /chat/completions).

To speed up the model weight downloading process, you'll need to add a Hugging Face access token stored as a Modal Secret.

For a single user, this achieves >150 tok/s output.

Open Frontends - /frontends

Agentic Coding TUI + WebUI - OpenCode

OpenCode is a terminal user interface for connecting human users, language models, and computer terminals, akin to Anthropic's Claude Code but with broader LLM API support.

We provide instructions for integrating the self-hosted LLM with OpenCode and for deploying OpenCode servers on Modal here

Agentic Assistant - OpenClaw

OpenClaw is an agentic assistant system designed for maximum integrability.

We provide instructions for integrating the self-hosted LLM with OpenClaw here.

Chat Web UI - AI SDK

The Vercel AI SDK offers both Core and UI sub SDKs for integrating JavaScript applications with LLMs.

We demonstrate a simple integration of this stack with the self-hosted LLM -- both a "hello world"-level integration with a NodeJS CLI here and a proper NextJS app here.

It is deployed here.

Chat CLI - llm

We like the llm CLI tool from Simon Willison for running quick LLM queries from the terminal.

It offers integration with OpenAI-compatible API providers, like our self-hosted LLM, via the same interface as OpenAI's models. Docs are here.

We demonstrate a small plugin in llm_show_reasoning that prints the LLM's reasoning output -- not available from OpenAI reasoning models, but available for open models. This reduces apparent latency.