Website | Guide | Reference | Paper (SOSP'25)
A programmable serving system for custom inference logic, stateful agents, and serving-side optimization.
Note Pie is pre-release software under active development. It is best suited for testing and research right now.
Today's LLM serving engines (e.g., vLLM, SGLang, TensorRT-LLM) are black boxes: prompt in, tokens out. But AI agents are a different kind of workload. They branch, call tools, retry, and coordinate long-running workflows, and forcing them through a monolithic token-generation pipeline leads to wasted round trips, KV cache thrashing, and engine patches for every new decoding trick.
Pie is a programmable serving system. It runs small user-supplied WebAssembly programs, called inferlets, directly next to the model. Inferlets have direct access to the KV cache and forward pass, so agent loops, tool calls, custom samplers, and cache policies can be customized and optimized per-application without modifying the engine.
Pie is a standalone binary, no Python needed.
For macOS and Linux:
curl -fsSL https://pie-project.org/install.sh | bashFor Windows, follow the installation guide.
Then configure and run:
pie config init
pie run text-completion -- --prompt "The capital of France is"| Directory | Description |
|---|---|
runtime/ |
Inferlet runtime |
server/ |
CLI |
inferlets/ |
Example inferlets |
sdk/ |
Inferlet SDKs (Rust · Python · JavaScript) |
client/ |
Client libraries (Rust · Python · JavaScript) |
driver/ |
Pie drivers (portable / CUDA / vLLM / SGLang) |
website/ |
pie-project.org docs site |
Questions and bug reports are welcome on GitHub Issues and GitHub Discussions.