Detailed token-usage tracking — per-model + estimated cost (closes #30) by cgpadwick · Pull Request #32 · cgpadwick/saage

cgpadwick · 2026-06-24T18:44:00Z

Closes #30. (Implemented autonomously while you were out — design decisions documented in the spec + below; flag anything you'd change.)

Extends the existing TokenUsage (which already summed provider-reported in/out tokens) with per-model breakdown and estimated cost.

What's new

Per-model: TokenUsage.add(usage, model) records aggregate + per-model id (by_model). Both providers pass self.model.
Cost (grounded, never guessed): new saage/pricing.py — a built-in table of rough public list prices (USD per 1M input/output), substring-matched against the model id (longest key wins, so gpt-4o-mini beats gpt-4o). cost() returns None for an unknown model, so a cost only shows when it's grounded. Overridable via SAAGE_PRICES (JSON {"<substring>": [in_per_1M, out_per_1M]}), merged over the built-ins.
Surfaced: run summary gains a cost: ~$X (estimated) line + a per-model breakdown when >1 model was used; the run dir gets usage.json ({calls, *_tokens, total_tokens, estimated_cost_usd, by_model{...}}) — best-effort, non-fatal.

Design calls I made (yours to override)

Cost is best-effort + None when unknown rather than estimated, to avoid misleading numbers. Prices live in pricing.py + SAAGE_PRICES override (no live price feed).
Per-model, not per-step/skill (usage is provider-reported per call, not per node) — noted as a future enhancement in the spec.

Tests

tests/test_pricing.py + new TokenUsage tests. Full suite 375 passed, 7 skipped.

Spec: docs/superpowers/specs/2026-06-24-token-usage-cost-design.md.

🤖 Generated with Claude Code

Closes #30. Extends the existing process-wide TokenUsage: - per-model breakdown: add(usage, model) records aggregate + per-model id; both providers pass self.model. - estimated USD cost via new saage/pricing.py: a built-in table of rough public list prices (USD per 1M in/out), substring-matched (longest key wins). cost() returns None for an unknown model — a cost is shown only when grounded in a known rate. Overridable via SAAGE_PRICES (JSON path), merged over the built-ins. - surfaced in the run summary (a `cost: ~$X (estimated)` line + per-model breakdown when >1 model) and written to the run dir as usage.json (per-model + cost; best-effort, non-fatal). tests: pricing (substring/longest-wins, math, unknown->None, env override, malformed ignored) + TokenUsage per-model/cost/as_dict. Full suite 375 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- USAGE singleton is now reset at the start of each `saage run` (main()), so a process that runs more than once (resume, tests, embedding) reports THIS run's usage/cost, not the cumulative-since-process-start total (same class as the #21 FileHandler-leak). usage.json + summary are now per-run. - usage.json write catches Exception (not just OSError) — a non-serializable edge in as_dict() would otherwise crash the process AFTER the flow succeeded. - pricing: SAAGE_PRICES override now wins a length tie against a built-in (`>=`, overrides merged last), matching the docstring; and a single malformed entry is skipped instead of discarding ALL overrides. - TokenUsage.add no longer builds a throwaway _ModelUsage on every call. tests: TokenUsage.reset clears all; override replaces built-in; one malformed entry keeps the rest. Full suite 378 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cgpadwick · 2026-06-24T19:10:02Z

Ran /code-review on this PR and fixed the findings (fb4afb3): (1) USAGE was never reset — a process running more than once (resume/tests/embedding) wrote a cumulative-since-start usage.json; now reset at the start of each saage run so it's per-run (same class as the #21 FileHandler-leak); (2) the usage.json write caught only OSError — broadened to Exception so a serialization edge can't crash the process after the flow already succeeded; (3) pricing: a SAAGE_PRICES override now actually wins a length tie against a built-in (matching the docstring), and one malformed entry is skipped instead of dropping ALL overrides; (4) add() no longer allocates a throwaway _ModelUsage per call. Tests added. Full suite 378 passed.

Copilot

Pull request overview

Adds richer, best-effort token-usage accounting to saage runs by extending the existing process-wide TokenUsage to track per-model usage and compute an estimated USD cost when (and only when) a known price entry matches the model id.

Changes:

Introduces saage/pricing.py with substring-based model pricing (rates) and cost computation (cost), including SAAGE_PRICES JSON-file overrides.
Extends TokenUsage to track per-model usage (by_model), expose total estimated cost, and serialize a usage.json artifact.
Updates CLI run summary output and adds tests for pricing + per-model token usage/cost behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`saage/pricing.py`	Adds model substring matching for list-price rates and a cost calculator with env-based overrides.
`saage/llm.py`	Extends `TokenUsage` to track per-model usage, compute estimated cost, and serialize a summary dict.
`saage/cli.py`	Prints estimated cost + optional per-model breakdown; writes `usage.json`; resets usage per run.
`tests/test_pricing.py`	Adds unit tests for pricing matching, cost math, and override behavior.
`tests/test_agent.py`	Adds tests for per-model accumulation, cost grounding/None behavior, and reset semantics.
`docs/superpowers/specs/2026-06-24-token-usage-cost-design.md`	Documents the design decisions and surfaced outputs for usage + cost tracking.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    out: dict[str, tuple[float, float]] = {}
+    for k, v in (raw.items() if isinstance(raw, dict) else ()):
+        try:                                     # skip ONE malformed entry rather
+            out[k] = (float(v[0]), float(v[1]))  # than dropping ALL overrides
+        except (TypeError, ValueError, IndexError, KeyError):
+            continue


    calls: int = 0
    prompt_tokens: int = 0
    completion_tokens: int = 0
+    by_model: dict = field(default_factory=dict)   # model id -> _ModelUsage


Two Copilot inline comments: - pricing.py: SAAGE_PRICES override keys are now lowercased so a mixed-case key (e.g. "DeepSeek") matches — rates() compares against a lowercased model id, so an unnormalized override key could never hit. Non-str keys skipped. - llm.py: TokenUsage.by_model annotated dict[str, _ModelUsage] (was bare dict). Test added for the mixed-case override match. Full suite 379 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cgpadwick and others added 2 commits June 24, 2026 11:43

cgpadwick requested a review from Copilot June 25, 2026 02:42

Copilot started reviewing on behalf of cgpadwick June 25, 2026 02:43 View session

Copilot AI reviewed Jun 25, 2026

View reviewed changes

cgpadwick merged commit 34648f4 into master Jun 27, 2026
6 checks passed

cgpadwick deleted the feat/token-usage-cost branch June 27, 2026 16:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detailed token-usage tracking — per-model + estimated cost (closes #30)#32

Detailed token-usage tracking — per-model + estimated cost (closes #30)#32
cgpadwick merged 3 commits into
masterfrom
feat/token-usage-cost

cgpadwick commented Jun 24, 2026

Uh oh!

cgpadwick commented Jun 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cgpadwick commented Jun 24, 2026

What's new

Design calls I made (yours to override)

Tests

Uh oh!

cgpadwick commented Jun 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants