Skip to content

aks-builds/agentsave

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

AgentSave β€” Cut AI agent token costs. One line of code.

SDK Tests Playwright License: MIT Token reduction

The first AI agent efficiency platform. Drop-in Python supervisor + real-time cost dashboard + inference router. Targeting ~30% token reduction with no accuracy loss β€” see BENCHMARKS.md.

AgentSave Dashboard Overview

flowchart LR
    SDK["πŸ’‘ pip install agentsave"]
    WRAP["supervise(agent)"]
    SUP["Supervisor\nContext Filter + Early Exit\n+ Budget Gate"]
    TEL["TelemetryClient\n(opt-in, zero PII)"]
    API["Dashboard Backend\nFastAPI + SQLite"]
    UI["agentsave-ui\nNext.js Dashboard"]
    IR["InferRoute\nDocker Sidecar"]
    VLLM["vLLM / SGLang\nCluster"]

    SDK --> WRAP
    WRAP --> SUP
    SUP -->|async fire-and-forget| TEL
    TEL --> API
    API --> UI
    SUP -->|"~30% token reduction"| WRAP
    IR -->|"~68% TTFT reduction"| VLLM
    API -.->|Enterprise tier| IR
Loading

πŸ”₯ The Problem

  • Every LLM agent wastes 30–50% of tokens on irrelevant tool outputs β€” inflating costs with no accuracy gain
  • Agents over-iterate past diminishing returns, burning tokens on iterations that add nothing
  • Developers have zero visibility into which agents, models, and frameworks are costing them the most

⚑ The Solution

SDK Layer

pip install agentsave, then wrap any agent with supervise(agent). The supervisor filters irrelevant context, exits early on diminishing returns, and enforces a budget gate β€” currently measuring ~23% token reduction on internal benchmarks, targeting ~30% on GAIA. See BENCHMARKS.md.

Dashboard Layer

Real-time cost tracking across every run, with a per-framework breakdown, an hourly activity heatmap, and an interactive cost projector to forecast monthly savings.

InferRoute Layer

PPD (append-prefill decode) routing for multi-turn agent workloads, delivering ~68% Turn 2+ TTFT reduction. Available on the Enterprise tier as a Docker sidecar in front of your vLLM / SGLang cluster.

🎬 In Action

  1. Overview dashboard β€” real-time savings stats with animated counters Overview
  2. Analytics β€” token reduction trend over time (area/line/bar toggle) Analytics
  3. Agent Runs β€” full run history with framework badges and reduction % Runs
  4. Cost Projector β€” interactive sliders to project monthly savings Cost Projector
  5. Live Activity Feed β€” real-time agent run stream Activity Feed
  6. Hourly Heatmap β€” GitHub-style activity grid Heatmap
  7. Command Palette β€” instant navigation and actions (⌘K) Command Palette
  8. Billing β€” Free / Pro / Enterprise tiers Billing

πŸš€ Quick Start

SDK only (no dashboard required):

pip install agentsave
from agentsave import supervise
agent = supervise(your_agent)   # wrap once β€” savings happen automatically
result = agent.invoke({"input": "your task"})
print(agent.last_run_state.tokens_consumed)  # see what was used

Full stack (SDK + dashboard backend + UI):

# 1. Start the dashboard backend
pip install agentsave-dashboard
agentsave-dashboard serve     # prints an API key on first run β€” copy it

# 2. Connect the SDK to your dashboard
cd your-project
agentsave login               # enter dashboard URL + API key when prompted
agentsave status              # confirm connection

# 3. Run your agents β€” telemetry flows automatically

# 4. Open the UI
git clone https://github.com/aks-builds/agentsave-ui
cd agentsave-ui && npm install
# add AGENTSAVE_API_KEY=ask-xxx and NEXT_PUBLIC_AGENTSAVE_API_KEY=ask-xxx to .env.local
npm run dev                   # http://localhost:3000

InferRoute (Enterprise, requires a vLLM/sGLang cluster):

git clone https://github.com/aks-builds/agentsave-inferroute
cd agentsave-inferroute
docker build -t inferroute .
docker run -d -p 8080:8080 \
  -e BACKEND_URL=http://your-vllm:8000 \
  -e BACKEND_TYPE=vllm \
  -e AGENTSAVE_TOKEN=$ENTERPRISE_LICENSE_JWT \
  inferroute

InferRoute requires an Enterprise license key and a self-hosted vLLM or sGLang inference cluster.

πŸ“¦ Installation

SDK:

pip install agentsave

# Framework-specific extras:
pip install "agentsave[langchain]"     # LangChain + LangGraph
pip install "agentsave[autogen]"       # AutoGen (via ag2)
pip install "agentsave[crewai]"        # CrewAI
pip install "agentsave[smolagents]"    # Smolagents
pip install "agentsave[all]"           # All frameworks

Dashboard backend:

pip install agentsave-dashboard
agentsave-dashboard serve --host 127.0.0.1 --port 8000

Dashboard UI:

git clone https://github.com/aks-builds/agentsave-ui
cd agentsave-ui && npm install
npm run dev   # http://localhost:3000

InferRoute (Enterprise, requires vLLM/sGLang cluster):

pip install agentsave-inferroute   # Python library + inferroute CLI
# OR run as a Docker container:
git clone https://github.com/aks-builds/agentsave-inferroute
cd agentsave-inferroute
docker build -t agentsave-inferroute .
docker run -p 8080:8080 -e BACKEND_URL=http://vllm:8000 agentsave-inferroute

πŸ§ͺ Verified Test Results

All numbers below come from actual runs β€” no projections or targets stated as facts.

SDK β€” pytest (CI-verified, Python 3.11/3.12/3.13):

88 passed, 3 skipped   (3 skipped = CrewAI import blocked by langchain 1.x on Python 3.14)
Ran in ~9s

Dashboard backend β€” pytest (CI-verified, Python 3.11/3.12/3.13):

26 passed
Ran in ~1s

InferRoute β€” pytest (CI-verified, Python 3.11/3.12/3.13):

59 passed, 1 warning
Ran in ~4s

UI β€” Playwright (requires running backend, not in CI):

Layer 1 (API-only, no browser):  15 passed   ← tests /api/* endpoints directly
Layer 2 (browser, structure):    33 passed   ← tests page rendering, navigation
Layer 3 (SDKβ†’UI full-stack):      8 passed   ← simulates SDK telemetry, verifies UI updates
Total:                            56 passed

Full-stack E2E with realistic data:

30 agent runs across 5 frameworks (LangChain, AutoGen, CrewAI, Smolagents, LangGraph), token counts 800–4 000/run, measured with agentsave-dashboard receiving telemetry from the SDK:

Token reduction:   29.6%   (target: ~30%)
Success rate:      86.7%
Frameworks tested: 5 / 5
Accuracy loss:     0%       (verified on 20-task synthetic benchmark)

See BENCHMARKS.md for the per-task synthetic benchmark (23.2% on static tasks) and the realistic workload results side-by-side.

What is and is not tested end-to-end today:

Component Tested How
SDK adapters (LangChain, LangGraph, AutoGen, Smolagents) βœ… Integration tests with real framework objects
SDK β†’ dashboard telemetry flow βœ… Full-stack E2E: SDK POSTs to dashboard, UI reflects data
Dashboard API endpoints βœ… 26 pytest + 15 Playwright API tests
Dashboard UI (browser) βœ… 33 Playwright browser tests
CrewAI adapter βœ… local, ⚠️ CI skipped Import fails on Python 3.14 (langchain 1.x compat)
InferRoute TTFT reduction ⚠️ projected ~68% is architectural projection; not yet measured on real cluster
pip install agentsave-dashboard βœ… On PyPI
pip install agentsave-inferroute βœ… On PyPI
Docker image (inferroute) ⚠️ build from source Not yet on Docker Hub β€” docker build from repo

πŸ— Architecture

  • Drop-in, zero-modification: supervise(agent) wraps any agent framework without touching internals
  • LLM-free context filter: TF-IDF cosine similarity β€” no extra API calls, <1ms overhead per observation
  • Benchmark-backed: 23.2% on synthetic 20-task set, 29.6% measured on realistic multi-framework workloads, 0% accuracy loss β€” see BENCHMARKS.md
  • Five framework adapters: LangChain, LangGraph, AutoGen, CrewAI, Smolagents β€” all tested
  • InferRoute PPD routing: ~68% Turn 2+ TTFT reduction is an architectural projection; requires Enterprise license and a self-hosted vLLM/sGLang cluster
  • Opt-in telemetry: zero PII β€” only run_id, framework, model, token counts, success flag
  • Self-hostable: dashboard backend and InferRoute are MIT-licensed and install from source

πŸ—Ί Roadmap

v0.2:

  • JavaScript/TypeScript SDK for Node.js agent frameworks
  • Real-time WebSocket events for the live feed
  • Team workspaces with RBAC

v0.3:

  • OpenAI Responses API adapter
  • Anthropic tool_use adapter
  • Cost anomaly alerts (email + webhook when a run exceeds threshold)

Tracked as GitHub Issues.

πŸ“ Project Structure

agentsave/              ← SDK (this repo)
β”œβ”€β”€ agentsave/          ← Python package
β”‚   β”œβ”€β”€ core/           ← context filter, early exit, budget gate, supervisor
β”‚   β”œβ”€β”€ adapters/       ← LangChain, LangGraph, AutoGen, CrewAI, Smolagents
β”‚   β”œβ”€β”€ telemetry/      ← opt-in async telemetry client
β”‚   └── cli/            ← agentsave login/status/config
└── tests/              ← 88 tests (unit + integration)

agentsave-dashboard/    ← FastAPI + SQLite backend
β”œβ”€β”€ agentsave_dashboard/
β”‚   β”œβ”€β”€ routers/        ← /api/events, /api/metrics, /api/tokens, /api/billing
β”‚   └── services/       ← metrics aggregation, retention
└── tests/              ← 26 tests

agentsave-ui/           ← Next.js 16 dashboard
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ components/     ← StatCard, charts, RunsTable, ActivityFeed, CommandPalette
β”‚   └── (routes)/       ← /, /analytics, /runs, /frameworks, /cost, /settings
└── tests/e2e/          ← 54 Playwright tests (3 layers)

agentsave-inferroute/   ← Enterprise inference router
β”œβ”€β”€ inferroute/
β”‚   β”œβ”€β”€ classifier.py   ← Turn 1 vs Turn 2+ detection
β”‚   β”œβ”€β”€ router.py       ← PPD scoring function
β”‚   └── adapters/       ← vLLM + SGLang
└── tests/              ← 59 tests

🀝 Contributing

See CONTRIBUTING.md for setup instructions, code style, and the PR checklist.

πŸ“„ License

MIT Β© 2026 Aditya Kumar Singh

About

Open-source AI agent efficiency platform: LLM-free context filter + early-exit supervisor cuts token costs ~30% with zero accuracy loss. Works with LangChain, AutoGen, CrewAI, LangGraph and Smolagents.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages