Skip to content

Latest commit

 

History

History
74 lines (51 loc) · 2.6 KB

File metadata and controls

74 lines (51 loc) · 2.6 KB

A programmable serving system for custom inference logic, stateful agents, and serving-side optimization.

Note Pie is pre-release software under active development. It is best suited for testing and research right now.

What is Pie?

Today's LLM serving engines (e.g., vLLM, SGLang, TensorRT-LLM) are black boxes: prompt in, tokens out. But AI agents are a different kind of workload. They branch, call tools, retry, and coordinate long-running workflows, and forcing them through a monolithic token-generation pipeline leads to wasted round trips, KV cache thrashing, and engine patches for every new decoding trick.

Pie is a programmable serving system. It runs small user-supplied WebAssembly programs, called inferlets, directly next to the model. Inferlets have direct access to the KV cache and forward pass, so agent loops, tool calls, custom samplers, and cache policies can be customized and optimized per-application without modifying the engine.

Quick Start

Pie is a standalone binary, no Python needed.

For macOS and Linux:

curl -fsSL https://pie-project.org/install.sh | bash

For Windows, follow the installation guide.

Then configure and run:

pie config init
pie run text-completion -- --prompt "The capital of France is"

Project Layout

Directory Description
runtime/ Inferlet runtime
server/ CLI
inferlets/ Example inferlets
sdk/ Inferlet SDKs (Rust · Python · JavaScript)
client/ Client libraries (Rust · Python · JavaScript)
driver/ Pie drivers (portable / CUDA / vLLM / SGLang)
website/ pie-project.org docs site

Getting Help

Questions and bug reports are welcome on GitHub Issues and GitHub Discussions.

License

Apache License 2.0