Skip to content

Detailed token-usage tracking — per-model + estimated cost (closes #30)#32

Merged
cgpadwick merged 3 commits into
masterfrom
feat/token-usage-cost
Jun 27, 2026
Merged

Detailed token-usage tracking — per-model + estimated cost (closes #30)#32
cgpadwick merged 3 commits into
masterfrom
feat/token-usage-cost

Conversation

@cgpadwick

Copy link
Copy Markdown
Owner

Closes #30. (Implemented autonomously while you were out — design decisions documented in the spec + below; flag anything you'd change.)

Extends the existing TokenUsage (which already summed provider-reported in/out tokens) with per-model breakdown and estimated cost.

What's new

  • Per-model: TokenUsage.add(usage, model) records aggregate + per-model id (by_model). Both providers pass self.model.
  • Cost (grounded, never guessed): new saage/pricing.py — a built-in table of rough public list prices (USD per 1M input/output), substring-matched against the model id (longest key wins, so gpt-4o-mini beats gpt-4o). cost() returns None for an unknown model, so a cost only shows when it's grounded. Overridable via SAAGE_PRICES (JSON {"<substring>": [in_per_1M, out_per_1M]}), merged over the built-ins.
  • Surfaced: run summary gains a cost: ~$X (estimated) line + a per-model breakdown when >1 model was used; the run dir gets usage.json ({calls, *_tokens, total_tokens, estimated_cost_usd, by_model{...}}) — best-effort, non-fatal.

Design calls I made (yours to override)

  • Cost is best-effort + None when unknown rather than estimated, to avoid misleading numbers. Prices live in pricing.py + SAAGE_PRICES override (no live price feed).
  • Per-model, not per-step/skill (usage is provider-reported per call, not per node) — noted as a future enhancement in the spec.

Tests

tests/test_pricing.py + new TokenUsage tests. Full suite 375 passed, 7 skipped.

Spec: docs/superpowers/specs/2026-06-24-token-usage-cost-design.md.

🤖 Generated with Claude Code

cgpadwick and others added 2 commits June 24, 2026 11:43
Closes #30. Extends the existing process-wide TokenUsage:

- per-model breakdown: add(usage, model) records aggregate + per-model id; both
  providers pass self.model.
- estimated USD cost via new saage/pricing.py: a built-in table of rough public
  list prices (USD per 1M in/out), substring-matched (longest key wins). cost()
  returns None for an unknown model — a cost is shown only when grounded in a
  known rate. Overridable via SAAGE_PRICES (JSON path), merged over the built-ins.
- surfaced in the run summary (a `cost: ~$X (estimated)` line + per-model
  breakdown when >1 model) and written to the run dir as usage.json (per-model +
  cost; best-effort, non-fatal).

tests: pricing (substring/longest-wins, math, unknown->None, env override,
malformed ignored) + TokenUsage per-model/cost/as_dict. Full suite 375 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- USAGE singleton is now reset at the start of each `saage run` (main()), so a
  process that runs more than once (resume, tests, embedding) reports THIS run's
  usage/cost, not the cumulative-since-process-start total (same class as the #21
  FileHandler-leak). usage.json + summary are now per-run.
- usage.json write catches Exception (not just OSError) — a non-serializable
  edge in as_dict() would otherwise crash the process AFTER the flow succeeded.
- pricing: SAAGE_PRICES override now wins a length tie against a built-in (`>=`,
  overrides merged last), matching the docstring; and a single malformed entry is
  skipped instead of discarding ALL overrides.
- TokenUsage.add no longer builds a throwaway _ModelUsage on every call.

tests: TokenUsage.reset clears all; override replaces built-in; one malformed
entry keeps the rest. Full suite 378 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cgpadwick

Copy link
Copy Markdown
Owner Author

Ran /code-review on this PR and fixed the findings (fb4afb3): (1) USAGE was never reset — a process running more than once (resume/tests/embedding) wrote a cumulative-since-start usage.json; now reset at the start of each saage run so it's per-run (same class as the #21 FileHandler-leak); (2) the usage.json write caught only OSError — broadened to Exception so a serialization edge can't crash the process after the flow already succeeded; (3) pricing: a SAAGE_PRICES override now actually wins a length tie against a built-in (matching the docstring), and one malformed entry is skipped instead of dropping ALL overrides; (4) add() no longer allocates a throwaway _ModelUsage per call. Tests added. Full suite 378 passed.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds richer, best-effort token-usage accounting to saage runs by extending the existing process-wide TokenUsage to track per-model usage and compute an estimated USD cost when (and only when) a known price entry matches the model id.

Changes:

  • Introduces saage/pricing.py with substring-based model pricing (rates) and cost computation (cost), including SAAGE_PRICES JSON-file overrides.
  • Extends TokenUsage to track per-model usage (by_model), expose total estimated cost, and serialize a usage.json artifact.
  • Updates CLI run summary output and adds tests for pricing + per-model token usage/cost behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
saage/pricing.py Adds model substring matching for list-price rates and a cost calculator with env-based overrides.
saage/llm.py Extends TokenUsage to track per-model usage, compute estimated cost, and serialize a summary dict.
saage/cli.py Prints estimated cost + optional per-model breakdown; writes usage.json; resets usage per run.
tests/test_pricing.py Adds unit tests for pricing matching, cost math, and override behavior.
tests/test_agent.py Adds tests for per-model accumulation, cost grounding/None behavior, and reset semantics.
docs/superpowers/specs/2026-06-24-token-usage-cost-design.md Documents the design decisions and surfaced outputs for usage + cost tracking.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread saage/pricing.py
Comment on lines +46 to +51
out: dict[str, tuple[float, float]] = {}
for k, v in (raw.items() if isinstance(raw, dict) else ()):
try: # skip ONE malformed entry rather
out[k] = (float(v[0]), float(v[1])) # than dropping ALL overrides
except (TypeError, ValueError, IndexError, KeyError):
continue
Comment thread saage/llm.py Outdated
Comment on lines +57 to +60
calls: int = 0
prompt_tokens: int = 0
completion_tokens: int = 0
by_model: dict = field(default_factory=dict) # model id -> _ModelUsage
Two Copilot inline comments:
- pricing.py: SAAGE_PRICES override keys are now lowercased so a mixed-case
  key (e.g. "DeepSeek") matches — rates() compares against a lowercased model
  id, so an unnormalized override key could never hit. Non-str keys skipped.
- llm.py: TokenUsage.by_model annotated dict[str, _ModelUsage] (was bare dict).

Test added for the mixed-case override match. Full suite 379 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cgpadwick cgpadwick merged commit 34648f4 into master Jun 27, 2026
6 checks passed
@cgpadwick cgpadwick deleted the feat/token-usage-cost branch June 27, 2026 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The saage engine should track token usage

2 participants