InferTrace

Catch model latency regressions before users do.

InferTrace is a high-performance observability collector for LLM inference services. It ingests spans from your model serving stack, tracks per-model latency, and detects regressions when you deploy a new model version.

Why This Exists

Generic tracing tools (Jaeger, Tempo) were not built for ML inference. They do not know about model versions, batch sizes, or token counts. When your p99 latency doubles after a model swap, you should not have to dig through generic trace data to find out why.

InferTrace is purpose-built for LLM inference pipelines: ML-aware span schema, per-model latency windows, and regression detection out of the box.

Who This Is For

ML engineers running LLM inference in production
Platform teams that own model serving infrastructure
Anyone who has been surprised by a latency regression after a model deploy

Architecture

Inference Service
      |
      | gRPC (SpanBatch)
      v
  [Collector]
      |
  Bounded Queue + Worker Pool
      |
      v
  In-Memory Store (per-model latency windows)
      |
      v
  HTTP Query API --> Regression Alerts

The collector is stateless and fast. The bounded queue uses drop-on-full backpressure so callers are never blocked. Workers drain the queue and write into per-model time windows. The query API answers: "how does current latency compare to baseline?"

Demo

Demo GIF coming in Phase 10. Will show: deploy new model -> latency regresses -> InferTrace detects and reports it.

Quickstart

Requirements: Go 1.21+

git clone https://github.com/your-username/infertrace
cd infertrace
go run ./cmd/collector

The collector starts on :50051 (gRPC) and is ready to accept span batches.

Run the full test suite:

go test ./...

What Is Built

Go span model with ingest validation (service_name, model_name, start_time_unix_nano, duration_nanos)
Protobuf + gRPC ingestion contract (SendSpanBatch)
Collector gRPC server with accepted/rejected counts per batch
Bounded in-memory pipeline with worker pool and drop-on-full backpressure

Roadmap

Phase	Status	What It Delivers
Span model + gRPC ingestion	Done	Core span struct, protobuf contract, gRPC collector
Collector pipeline	Done	Bounded queue, worker pool, backpressure
In-memory store + query API	Next	Per-model latency windows, p50/p95/p99 HTTP endpoint
Latency regression detection	Planned	Compare current vs baseline, surface regressions
Python SDK	Planned	Decorator-based tracing for PyTorch / HuggingFace
Demo + benchmarks	Planned	End-to-end demo with injected regression scenario

Project Structure

cmd/collector/main.go          -- app bootstrap, gRPC server, graceful shutdown
internal/collector/span.go     -- span model and validation
internal/collector/server.go   -- gRPC handler, protobuf-to-domain conversion
internal/collector/collector.go -- bounded queue, workers, counters
proto/span.proto               -- ingestion API contract

Contributing

This project is under active development. Good places to start:

Issues labeled good first issue
Issues labeled help wanted

Feedback, bug reports, and pull requests are welcome. If you run LLM services, feedback on the design is especially valuable.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
cmd		cmd
detector		detector
internal		internal
learnings		learnings
media		media
proto		proto
sdk/python		sdk/python
tasks		tasks
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
claude.md		claude.md
go.mod		go.mod
go.sum		go.sum
main.py		main.py
outline.md		outline.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InferTrace

Why This Exists

Who This Is For

Architecture

Demo

Quickstart

What Is Built

Roadmap

Project Structure

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

InferTrace

Why This Exists

Who This Is For

Architecture

Demo

Quickstart

What Is Built

Roadmap

Project Structure

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages