The first AI agent efficiency platform. Drop-in Python supervisor + real-time cost dashboard + inference router. Targeting ~30% token reduction with no accuracy loss β see BENCHMARKS.md.
flowchart LR
SDK["π‘ pip install agentsave"]
WRAP["supervise(agent)"]
SUP["Supervisor\nContext Filter + Early Exit\n+ Budget Gate"]
TEL["TelemetryClient\n(opt-in, zero PII)"]
API["Dashboard Backend\nFastAPI + SQLite"]
UI["agentsave-ui\nNext.js Dashboard"]
IR["InferRoute\nDocker Sidecar"]
VLLM["vLLM / SGLang\nCluster"]
SDK --> WRAP
WRAP --> SUP
SUP -->|async fire-and-forget| TEL
TEL --> API
API --> UI
SUP -->|"~30% token reduction"| WRAP
IR -->|"~68% TTFT reduction"| VLLM
API -.->|Enterprise tier| IR
- Every LLM agent wastes 30β50% of tokens on irrelevant tool outputs β inflating costs with no accuracy gain
- Agents over-iterate past diminishing returns, burning tokens on iterations that add nothing
- Developers have zero visibility into which agents, models, and frameworks are costing them the most
pip install agentsave, then wrap any agent with supervise(agent). The supervisor filters irrelevant context, exits early on diminishing returns, and enforces a budget gate β currently measuring ~23% token reduction on internal benchmarks, targeting ~30% on GAIA. See BENCHMARKS.md.
Real-time cost tracking across every run, with a per-framework breakdown, an hourly activity heatmap, and an interactive cost projector to forecast monthly savings.
PPD (append-prefill decode) routing for multi-turn agent workloads, delivering ~68% Turn 2+ TTFT reduction. Available on the Enterprise tier as a Docker sidecar in front of your vLLM / SGLang cluster.
- Overview dashboard β real-time savings stats with animated counters

- Analytics β token reduction trend over time (area/line/bar toggle)

- Agent Runs β full run history with framework badges and reduction %

- Cost Projector β interactive sliders to project monthly savings

- Live Activity Feed β real-time agent run stream

- Hourly Heatmap β GitHub-style activity grid

- Command Palette β instant navigation and actions (βK)

- Billing β Free / Pro / Enterprise tiers

SDK only (no dashboard required):
pip install agentsavefrom agentsave import supervise
agent = supervise(your_agent) # wrap once β savings happen automatically
result = agent.invoke({"input": "your task"})
print(agent.last_run_state.tokens_consumed) # see what was usedFull stack (SDK + dashboard backend + UI):
# 1. Start the dashboard backend
pip install agentsave-dashboard
agentsave-dashboard serve # prints an API key on first run β copy it
# 2. Connect the SDK to your dashboard
cd your-project
agentsave login # enter dashboard URL + API key when prompted
agentsave status # confirm connection
# 3. Run your agents β telemetry flows automatically
# 4. Open the UI
git clone https://github.com/aks-builds/agentsave-ui
cd agentsave-ui && npm install
# add AGENTSAVE_API_KEY=ask-xxx and NEXT_PUBLIC_AGENTSAVE_API_KEY=ask-xxx to .env.local
npm run dev # http://localhost:3000InferRoute (Enterprise, requires a vLLM/sGLang cluster):
git clone https://github.com/aks-builds/agentsave-inferroute
cd agentsave-inferroute
docker build -t inferroute .
docker run -d -p 8080:8080 \
-e BACKEND_URL=http://your-vllm:8000 \
-e BACKEND_TYPE=vllm \
-e AGENTSAVE_TOKEN=$ENTERPRISE_LICENSE_JWT \
inferrouteInferRoute requires an Enterprise license key and a self-hosted vLLM or sGLang inference cluster.
SDK:
pip install agentsave
# Framework-specific extras:
pip install "agentsave[langchain]" # LangChain + LangGraph
pip install "agentsave[autogen]" # AutoGen (via ag2)
pip install "agentsave[crewai]" # CrewAI
pip install "agentsave[smolagents]" # Smolagents
pip install "agentsave[all]" # All frameworksDashboard backend:
pip install agentsave-dashboard
agentsave-dashboard serve --host 127.0.0.1 --port 8000Dashboard UI:
git clone https://github.com/aks-builds/agentsave-ui
cd agentsave-ui && npm install
npm run dev # http://localhost:3000InferRoute (Enterprise, requires vLLM/sGLang cluster):
pip install agentsave-inferroute # Python library + inferroute CLI
# OR run as a Docker container:
git clone https://github.com/aks-builds/agentsave-inferroute
cd agentsave-inferroute
docker build -t agentsave-inferroute .
docker run -p 8080:8080 -e BACKEND_URL=http://vllm:8000 agentsave-inferrouteAll numbers below come from actual runs β no projections or targets stated as facts.
SDK β pytest (CI-verified, Python 3.11/3.12/3.13):
88 passed, 3 skipped (3 skipped = CrewAI import blocked by langchain 1.x on Python 3.14)
Ran in ~9s
Dashboard backend β pytest (CI-verified, Python 3.11/3.12/3.13):
26 passed
Ran in ~1s
InferRoute β pytest (CI-verified, Python 3.11/3.12/3.13):
59 passed, 1 warning
Ran in ~4s
UI β Playwright (requires running backend, not in CI):
Layer 1 (API-only, no browser): 15 passed β tests /api/* endpoints directly
Layer 2 (browser, structure): 33 passed β tests page rendering, navigation
Layer 3 (SDKβUI full-stack): 8 passed β simulates SDK telemetry, verifies UI updates
Total: 56 passed
Full-stack E2E with realistic data:
30 agent runs across 5 frameworks (LangChain, AutoGen, CrewAI, Smolagents, LangGraph), token counts
800β4 000/run, measured with agentsave-dashboard receiving telemetry from the SDK:
Token reduction: 29.6% (target: ~30%)
Success rate: 86.7%
Frameworks tested: 5 / 5
Accuracy loss: 0% (verified on 20-task synthetic benchmark)
See BENCHMARKS.md for the per-task synthetic benchmark (23.2% on static tasks) and the realistic workload results side-by-side.
What is and is not tested end-to-end today:
| Component | Tested | How |
|---|---|---|
| SDK adapters (LangChain, LangGraph, AutoGen, Smolagents) | β | Integration tests with real framework objects |
| SDK β dashboard telemetry flow | β | Full-stack E2E: SDK POSTs to dashboard, UI reflects data |
| Dashboard API endpoints | β | 26 pytest + 15 Playwright API tests |
| Dashboard UI (browser) | β | 33 Playwright browser tests |
| CrewAI adapter | β
local, |
Import fails on Python 3.14 (langchain 1.x compat) |
| InferRoute TTFT reduction | ~68% is architectural projection; not yet measured on real cluster | |
pip install agentsave-dashboard |
β | On PyPI |
pip install agentsave-inferroute |
β | On PyPI |
| Docker image (inferroute) | Not yet on Docker Hub β docker build from repo |
- Drop-in, zero-modification:
supervise(agent)wraps any agent framework without touching internals - LLM-free context filter: TF-IDF cosine similarity β no extra API calls, <1ms overhead per observation
- Benchmark-backed: 23.2% on synthetic 20-task set, 29.6% measured on realistic multi-framework workloads, 0% accuracy loss β see BENCHMARKS.md
- Five framework adapters: LangChain, LangGraph, AutoGen, CrewAI, Smolagents β all tested
- InferRoute PPD routing: ~68% Turn 2+ TTFT reduction is an architectural projection; requires Enterprise license and a self-hosted vLLM/sGLang cluster
- Opt-in telemetry: zero PII β only run_id, framework, model, token counts, success flag
- Self-hostable: dashboard backend and InferRoute are MIT-licensed and install from source
v0.2:
- JavaScript/TypeScript SDK for Node.js agent frameworks
- Real-time WebSocket events for the live feed
- Team workspaces with RBAC
v0.3:
- OpenAI Responses API adapter
- Anthropic tool_use adapter
- Cost anomaly alerts (email + webhook when a run exceeds threshold)
Tracked as GitHub Issues.
agentsave/ β SDK (this repo)
βββ agentsave/ β Python package
β βββ core/ β context filter, early exit, budget gate, supervisor
β βββ adapters/ β LangChain, LangGraph, AutoGen, CrewAI, Smolagents
β βββ telemetry/ β opt-in async telemetry client
β βββ cli/ β agentsave login/status/config
βββ tests/ β 88 tests (unit + integration)
agentsave-dashboard/ β FastAPI + SQLite backend
βββ agentsave_dashboard/
β βββ routers/ β /api/events, /api/metrics, /api/tokens, /api/billing
β βββ services/ β metrics aggregation, retention
βββ tests/ β 26 tests
agentsave-ui/ β Next.js 16 dashboard
βββ app/
β βββ components/ β StatCard, charts, RunsTable, ActivityFeed, CommandPalette
β βββ (routes)/ β /, /analytics, /runs, /frameworks, /cost, /settings
βββ tests/e2e/ β 54 Playwright tests (3 layers)
agentsave-inferroute/ β Enterprise inference router
βββ inferroute/
β βββ classifier.py β Turn 1 vs Turn 2+ detection
β βββ router.py β PPD scoring function
β βββ adapters/ β vLLM + SGLang
βββ tests/ β 59 tests
See CONTRIBUTING.md for setup instructions, code style, and the PR checklist.
MIT Β© 2026 Aditya Kumar Singh