Detailed token-usage tracking — per-model + estimated cost (closes #30)#32
Conversation
Closes #30. Extends the existing process-wide TokenUsage: - per-model breakdown: add(usage, model) records aggregate + per-model id; both providers pass self.model. - estimated USD cost via new saage/pricing.py: a built-in table of rough public list prices (USD per 1M in/out), substring-matched (longest key wins). cost() returns None for an unknown model — a cost is shown only when grounded in a known rate. Overridable via SAAGE_PRICES (JSON path), merged over the built-ins. - surfaced in the run summary (a `cost: ~$X (estimated)` line + per-model breakdown when >1 model) and written to the run dir as usage.json (per-model + cost; best-effort, non-fatal). tests: pricing (substring/longest-wins, math, unknown->None, env override, malformed ignored) + TokenUsage per-model/cost/as_dict. Full suite 375 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- USAGE singleton is now reset at the start of each `saage run` (main()), so a process that runs more than once (resume, tests, embedding) reports THIS run's usage/cost, not the cumulative-since-process-start total (same class as the #21 FileHandler-leak). usage.json + summary are now per-run. - usage.json write catches Exception (not just OSError) — a non-serializable edge in as_dict() would otherwise crash the process AFTER the flow succeeded. - pricing: SAAGE_PRICES override now wins a length tie against a built-in (`>=`, overrides merged last), matching the docstring; and a single malformed entry is skipped instead of discarding ALL overrides. - TokenUsage.add no longer builds a throwaway _ModelUsage on every call. tests: TokenUsage.reset clears all; override replaces built-in; one malformed entry keeps the rest. Full suite 378 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Ran /code-review on this PR and fixed the findings (fb4afb3): (1) USAGE was never reset — a process running more than once (resume/tests/embedding) wrote a cumulative-since-start |
There was a problem hiding this comment.
Pull request overview
Adds richer, best-effort token-usage accounting to saage runs by extending the existing process-wide TokenUsage to track per-model usage and compute an estimated USD cost when (and only when) a known price entry matches the model id.
Changes:
- Introduces
saage/pricing.pywith substring-based model pricing (rates) and cost computation (cost), includingSAAGE_PRICESJSON-file overrides. - Extends
TokenUsageto track per-model usage (by_model), expose total estimated cost, and serialize ausage.jsonartifact. - Updates CLI run summary output and adds tests for pricing + per-model token usage/cost behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
saage/pricing.py |
Adds model substring matching for list-price rates and a cost calculator with env-based overrides. |
saage/llm.py |
Extends TokenUsage to track per-model usage, compute estimated cost, and serialize a summary dict. |
saage/cli.py |
Prints estimated cost + optional per-model breakdown; writes usage.json; resets usage per run. |
tests/test_pricing.py |
Adds unit tests for pricing matching, cost math, and override behavior. |
tests/test_agent.py |
Adds tests for per-model accumulation, cost grounding/None behavior, and reset semantics. |
docs/superpowers/specs/2026-06-24-token-usage-cost-design.md |
Documents the design decisions and surfaced outputs for usage + cost tracking. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| out: dict[str, tuple[float, float]] = {} | ||
| for k, v in (raw.items() if isinstance(raw, dict) else ()): | ||
| try: # skip ONE malformed entry rather | ||
| out[k] = (float(v[0]), float(v[1])) # than dropping ALL overrides | ||
| except (TypeError, ValueError, IndexError, KeyError): | ||
| continue |
| calls: int = 0 | ||
| prompt_tokens: int = 0 | ||
| completion_tokens: int = 0 | ||
| by_model: dict = field(default_factory=dict) # model id -> _ModelUsage |
Two Copilot inline comments: - pricing.py: SAAGE_PRICES override keys are now lowercased so a mixed-case key (e.g. "DeepSeek") matches — rates() compares against a lowercased model id, so an unnormalized override key could never hit. Non-str keys skipped. - llm.py: TokenUsage.by_model annotated dict[str, _ModelUsage] (was bare dict). Test added for the mixed-case override match. Full suite 379 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Closes #30. (Implemented autonomously while you were out — design decisions documented in the spec + below; flag anything you'd change.)
Extends the existing
TokenUsage(which already summed provider-reported in/out tokens) with per-model breakdown and estimated cost.What's new
TokenUsage.add(usage, model)records aggregate + per-model id (by_model). Both providers passself.model.saage/pricing.py— a built-in table of rough public list prices (USD per 1M input/output), substring-matched against the model id (longest key wins, sogpt-4o-minibeatsgpt-4o).cost()returns None for an unknown model, so a cost only shows when it's grounded. Overridable viaSAAGE_PRICES(JSON{"<substring>": [in_per_1M, out_per_1M]}), merged over the built-ins.cost: ~$X (estimated)line + a per-model breakdown when >1 model was used; the run dir getsusage.json({calls, *_tokens, total_tokens, estimated_cost_usd, by_model{...}}) — best-effort, non-fatal.Design calls I made (yours to override)
pricing.py+SAAGE_PRICESoverride (no live price feed).Tests
tests/test_pricing.py+ newTokenUsagetests. Full suite 375 passed, 7 skipped.Spec:
docs/superpowers/specs/2026-06-24-token-usage-cost-design.md.🤖 Generated with Claude Code