Budget enforcement and cost tracking for LLM applications.
Stop runaway API spend from bugs, prompt injection, or retry loops — before it hits your credit card.
from ai_cost_guard import CostGuard
guard = CostGuard(weekly_budget_usd=5.00)
@guard.protect(model="anthropic/claude-haiku-4-5-20251001")
def call_claude(prompt: str):
return client.messages.create(...) # blocked if budget exceededPart of the AI Agent Infrastructure Stack:
- ai-cost-guard — budget enforcement ← you are here
- ai-injection-guard — prompt injection scanner
- ai-decision-tracer — local agent decision tracer
When you build with LLMs, three things will eventually go wrong:
- A bug creates an infinite retry loop — you wake up to a $300 bill.
- A prompt injection attack causes your app to make thousands of unexpected calls.
- A junior dev accidentally calls GPT-4o instead of GPT-4o-mini in a tight loop.
ai-cost-guard is a hard stop. It tracks every LLM call, accumulates cost,
and raises BudgetExceededError before the next call goes through.
Zero runtime dependencies. Pure Python stdlib. Works with any LLM provider.
pip install ai-cost-guardOr from source:
git clone https://github.com/LuciferForge/ai-cost-guard
cd ai-cost-guard
pip install -e ".[dev]"from ai_cost_guard import CostGuard
import anthropic
client = anthropic.Anthropic()
guard = CostGuard(weekly_budget_usd=5.00)
@guard.protect(model="anthropic/claude-haiku-4-5-20251001", purpose="summarizer")
def summarize(text: str):
return client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=256,
messages=[{"role": "user", "content": f"Summarize: {text}"}],
)guard.check_budget("openai/gpt-4o", estimated_input=500, estimated_output=200)
response = openai_client.chat.completions.create(...)
guard.record(
model="openai/gpt-4o",
input_tokens=response.usage.prompt_tokens,
output_tokens=response.usage.completion_tokens,
)guard = CostGuard(weekly_budget_usd=5.00, dry_run=True)
# All calls raise BudgetExceededError("DRY RUN") — safe to use in CI# Show current spend vs budget
ai-cost-guard status
# List all calls this period
ai-cost-guard calls
# List all registered models with pricing
ai-cost-guard models
# Check if a model call would be allowed given a budget
ai-cost-guard check anthropic/claude-sonnet-4-6 5.00
# Reset the tracker
ai-cost-guard reset| Provider | Models |
|---|---|
| Anthropic | claude-haiku-4-5, claude-sonnet-4-6, claude-opus-4-6 |
| OpenAI | gpt-4o, gpt-4o-mini, gpt-3.5-turbo |
| gemini-1.5-flash, gemini-1.5-pro | |
| Ollama (local) | qwen2.5:7b, llama3.2:3b, mistral:7b (always $0.00) |
Adding a new model:
from ai_cost_guard import PROVIDERS
PROVIDERS["myprovider/mymodel"] = {
"input": 1.00 / 1_000_000, # per token
"output": 4.00 / 1_000_000,
}- Hard budget cap — raises
BudgetExceededErrorbefore the call, not after. - No network calls — all data stored locally in
~/.ai-cost-guard/cost_log.json. - Atomic writes — cost log uses temp-file + rename to prevent corruption.
- Zero dependencies — nothing to supply-chain attack.
- Audit trail — every call logged with timestamp, model, tokens, and purpose.
See SECURITY.md for full security policy.
| Tool | Hard budget stop | Multi-provider | Zero deps | Local storage |
|---|---|---|---|---|
| ai-cost-guard | ✅ | ✅ | ✅ | ✅ |
| LangChain callbacks | ❌ (observe only) | ✅ | ❌ | ❌ |
| OpenAI usage limits | ✅ | ❌ | N/A | ❌ |
| Manual tracking | ❌ | depends | ✅ | depends |
pip install -e ".[dev]"
pytest tests/ -vPRs welcome. Please:
- Keep zero runtime dependencies.
- Add tests for new providers.
- Update pricing when providers change rates.
MIT — free to use, modify, and distribute.