tessera-rs

tessera-rs is a Rust-first LLM inference serving framework inspired by vLLM, with a phased roadmap that starts from a focused single-GPU serving core and grows toward multi-GPU, multi-node, and adaptive serving at scale.

Project goals

Serve real decoder-only workloads with a pragmatic systems-first design.
Reuse high-performance kernels and distributed communication backends rather than rewriting everything from scratch.
Build in public as a proper OSS project with visible milestones, issues, CI, Docker, and Kubernetes support.
Differentiate with TAC (Tessera Adaptive Controller), an adaptive control-loop initiative focused on goodput and latency SLOs.

Current phase

The repository is currently in early M1 Single GPU Serving Core.

The M0 OSS Foundation work is in place, and the codebase now also has a real first serving slice:

request lifecycle and scheduler primitives
paged KV metadata plus Tessera-owned backend KV storage for the current Llama path
a concrete single-GPU decoder-only serving loop
OpenAI-compatible completions with streaming
TTFT, ITL, throughput, and request outcome metrics
Docker, CI, and repository governance scaffolding from M0

Roadmap highlights

M0 OSS Foundation
M1 Single GPU Serving Core
M2 Latency and Throughput Optimizations
M3 Single-Node Multi-GPU
M4 Static Multi-Node Serving
M5 Elastic Multi-Node Scaling
M6 Disaggregated Prefill and Decode
M7 Speculative Decoding
M8 Hybrid Attention and Cache Layouts
M9 Mixture of Experts
M10 General Model Support
M11 Guided Decoding
TAC Tessera Adaptive Controller

See ROADMAP.md for the detailed milestone view and seeded issues.

Quickstart

Local Rust workflow

cargo fmt --all
cargo clippy --workspace --all-targets -- -D warnings
cargo test --workspace
cargo run -p tessera-server

CUDA builds are opt-in. In a CUDA-capable environment, use:

cargo check -p tessera-engine --features cuda
TESSERA_DEVICE=cuda:0 cargo run -p tessera-server --features cuda

You can also use the root Makefile for the common local flow:

make download-llama
make server
make test-normal-completion
make test-stream-completion
make metrics-json

For CUDA via make:

make server CARGO_FEATURES=cuda DEVICE=cuda:0
make check-cuda

The bootstrap service listens on http://127.0.0.1:8080.

Useful endpoints:

GET /
GET /healthz
GET /readyz
GET /version

Docker

docker build -t tessera-rs:dev .
docker run --rm -p 8080:8080 tessera-rs:dev

Helm

helm install tessera deploy/helm/tessera

Repository structure

Current working layout

crates/tessera-engine-core
- serving control-plane primitives such as request lifecycle, scheduler, sampler, and KV metadata
crates/tessera-engine
- concrete runtime integration layer
- runtime/ owns session orchestration and output emission
- backend/ owns backend traits and selection/loading
- backends/llama/ owns the current Llama-family implementation
crates/tessera-server
- HTTP/OpenAI-compatible serving surface, metrics endpoints, server bootstrap, and tracing setup
docs/architecture
- current implementation notes such as the end-to-end request flow
docs/rfcs
- accepted and proposed design documents
.github
- CI, issue templates, pull request template, and roadmap bootstrap assets
deploy/helm/tessera
- Helm chart skeleton
scripts/
- GitHub bootstrap and workflow automation

Target long-term layout

This is the intended end-state structure for the full project. Not every directory or crate needs to exist immediately; add them when the roadmap actually demands them.

tessera-rs/
├── .github/
│   ├── bootstrap/                     # roadmap + project board source of truth
│   ├── workflows/                     # CI/CD workflows
│   └── ISSUE_TEMPLATE/                # issue forms and templates
├── crates/
│   ├── tessera-server/                # HTTP/OpenAI-compatible server entrypoint
│   ├── tessera-cli/                   # admin, dev, and utility CLI
│   ├── tessera-core/                  # shared config, errors, common types
│   ├── tessera-protocol/              # API and control-plane message types
│   ├── tessera-engine/                # public engine facade used by server/CLI
│   ├── tessera-engine-core/           # request lifecycle, scheduling, token budgets
│   ├── tessera-kv-cache/              # paged KV cache, allocators, prefix cache
│   ├── tessera-executor/              # executor traits and worker abstractions
│   ├── tessera-backend-cuda/          # CUDA/NCCL/FlashAttention backend bindings
│   ├── tessera-sampler/               # sampling, stop conditions, specdec, guided decoding
│   ├── tessera-models/                # model family adapters and capability registry
│   ├── tessera-control-plane/         # multi-node routing, admission, coordination
│   ├── tessera-tac/                   # Tessera Adaptive Controller
│   └── tessera-bench/                 # benchmarking and load-generation tooling
├── configs/
│   ├── local/                         # local and development configs
│   ├── single-node/                   # single-node serving profiles
│   └── multi-node/                    # distributed serving profiles
├── deploy/
│   ├── helm/tessera/                  # Helm chart
│   ├── compose/                       # local multi-service orchestration
│   └── k8s/examples/                  # concrete Kubernetes examples
├── docs/
│   ├── architecture/                  # subsystem-specific architecture docs
│   ├── benchmarks/                    # benchmark methodology and results
│   ├── operations/                    # deployment and maintainer workflows
│   └── rfcs/                          # design proposals and accepted RFCs
├── examples/
│   ├── single-gpu/                    # minimal local serving examples
│   ├── multi-gpu/                     # tensor-parallel examples
│   ├── multi-node/                    # API + worker examples
│   ├── structured-output/             # guided decoding examples
│   └── speculative-decoding/          # speculative decoding examples
├── scripts/                           # GitHub/bootstrap/dev automation
├── tests/
│   ├── integration/                   # crate-level integration tests
│   ├── distributed/                   # multi-process and multi-node tests
│   ├── e2e/                           # HTTP/API end-to-end tests
│   └── fixtures/                      # prompts, configs, and test data
├── ARCHITECTURE.md
├── CONTRIBUTING.md
├── ROADMAP.md
├── Cargo.toml
└── README.md

Structure conventions

Binary entrypoints should stay in focused crates like tessera-server, tessera-cli, and tessera-bench.
Engine internals should be split by responsibility: engine core, KV-cache, executor, sampler, models, and control-plane.
tessera-tac should remain its own crate so the adaptive controller can evolve independently from the serving core.
Deployment, operational docs, benchmark methodology, and GitHub automation should live outside the Rust crates.
Avoid creating empty crates just to match the final tree; the structure is a guide for evolution, not a requirement to scaffold everything immediately.

GitHub roadmap bootstrap

The repository includes checked-in source of truth for milestones, labels, seeded issues, epic tasklists, and the GitHub Project board:

.github/bootstrap/roadmap.json
.github/bootstrap/project.json
scripts/bootstrap_github.sh
scripts/bootstrap_project.sh
scripts/sync_epic_tasklists.sh
scripts/sync_project_fields.sh

With GitHub CLI authenticated, you can bootstrap the roadmap like this:

./scripts/bootstrap_github.sh
./scripts/bootstrap_project.sh

Contributing

Start with:

CONTRIBUTING.md
ROADMAP.md
ARCHITECTURE.md
docs/operations/issue-workflow.md

License

Apache-2.0. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tessera-rs

Project goals

Current phase

Roadmap highlights

Quickstart

Local Rust workflow

Docker

Helm

Repository structure

Current working layout

Target long-term layout

Structure conventions

GitHub roadmap bootstrap

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github		.github
crates		crates
deploy/helm/tessera		deploy/helm/tessera
docs		docs
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
rust-toolchain.toml		rust-toolchain.toml

Folders and files

Latest commit

History

Repository files navigation

tessera-rs

Project goals

Current phase

Roadmap highlights

Quickstart

Local Rust workflow

Docker

Helm

Repository structure

Current working layout

Target long-term layout

Structure conventions

GitHub roadmap bootstrap

Contributing

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages