refactor(core): source pricing + context limits from tokenlens registry#8
Merged
Merged
Conversation
Replace the hand-maintained RATES/MODELS tables in packages/core with a registry built at module load from @tokenlens/models (anthropic + openai + google subpaths). The 7-entry curated set grows to 42 stable models with no additional manual maintenance, and every model now carries contextWindow + maxOutputTokens + pricingSource metadata sourced from models.dev. Three Anthropic models (claude-haiku-4-5, claude-opus-4-7, claude-sonnet-4-6) are not yet in tokenlens upstream, so they remain in a small LOCAL_OVERRIDES table that wins on id collision. A weekly .github/workflows/registry-check.yml runs scripts/check-overrides.mjs which detects when upstream catches up (action: drop the override) or when tokenlens-sourced pricing drifts from the checked-in snapshot at packages/core/src/__snapshots__/registry.json. CI opens a tracking issue on findings. Public API (getRate, getModel, KNOWN_MODELS, RateEntry, ModelDescriptor) is byte-compatible. Consumers gain the new metadata: - CLI prints a Limits: block under the cost table showing ctx + max output per unique model - web Playground groups model checkboxes by provider with a context-window chip suffix (gpt-4o · 128k) - GitHub Action appends a Limits: line to the sticky PR comment benchmarks/run.mjs gains a --filter / --models flag (also BENCH_MODELS env) so the regenerate sweep can be scoped — the matrix grew from 7×N×M to 42×N×M cells. Existing results.json is regenerated to match. Tests are loosened from exact-dollar assertions to (a) per-model positive-price invariants, (b) canary checks on stable IDs (gpt-4o, gpt-4o-mini), (c) a registry-size floor (>=15) so a broken/empty registry can't pass silently. 48 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
3 tasks
faraa2m
added a commit
that referenced
this pull request
May 9, 2026
- replace `--format md` with real format names (markdown, text) - drop stale `gemini-2.5-pro` example; add stdin example and `--help` pointer - correct empirical-mode description: real provider countTokens, no caching claim - fix CI snippet to match @v0 action inputs (models/formats/budget) - methodology table: empirical column reflects shipped countTokens dispatch - note pricing now comes from tokenlens registry (post #8 refactor) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
faraa2m
added a commit
that referenced
this pull request
May 11, 2026
…ry (#8) Replace the hand-maintained RATES/MODELS tables in packages/core with a registry built at module load from @tokenlens/models (anthropic + openai + google subpaths). The 7-entry curated set grows to 42 stable models with no additional manual maintenance, and every model now carries contextWindow + maxOutputTokens + pricingSource metadata sourced from models.dev. Three Anthropic models (claude-haiku-4-5, claude-opus-4-7, claude-sonnet-4-6) are not yet in tokenlens upstream, so they remain in a small LOCAL_OVERRIDES table that wins on id collision. A weekly .github/workflows/registry-check.yml runs scripts/check-overrides.mjs which detects when upstream catches up (action: drop the override) or when tokenlens-sourced pricing drifts from the checked-in snapshot at packages/core/src/__snapshots__/registry.json. CI opens a tracking issue on findings. Public API (getRate, getModel, KNOWN_MODELS, RateEntry, ModelDescriptor) is byte-compatible. Consumers gain the new metadata: - CLI prints a Limits: block under the cost table showing ctx + max output per unique model - web Playground groups model checkboxes by provider with a context-window chip suffix (gpt-4o · 128k) - GitHub Action appends a Limits: line to the sticky PR comment benchmarks/run.mjs gains a --filter / --models flag (also BENCH_MODELS env) so the regenerate sweep can be scoped — the matrix grew from 7×N×M to 42×N×M cells. Existing results.json is regenerated to match. Tests are loosened from exact-dollar assertions to (a) per-model positive-price invariants, (b) canary checks on stable IDs (gpt-4o, gpt-4o-mini), (c) a registry-size floor (>=15) so a broken/empty registry can't pass silently. 48 tests pass. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
faraa2m
added a commit
that referenced
this pull request
May 11, 2026
- replace `--format md` with real format names (markdown, text) - drop stale `gemini-2.5-pro` example; add stdin example and `--help` pointer - correct empirical-mode description: real provider countTokens, no caching claim - fix CI snippet to match @v0 action inputs (models/formats/budget) - methodology table: empirical column reflects shipped countTokens dispatch - note pricing now comes from tokenlens registry (post #8 refactor) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
RATES/MODELStables inpackages/core/src/rates.tswith a registry built at module load from@tokenlens/models(anthropic + openai + google subpaths). 7 curated models → 42 stable models, no extra manual maintenance.ModelDescriptornow carriescontextWindow,maxOutputTokens, andpricingSource: 'local' | 'tokenlens'(frommodels.dev).Limits:block under the cost table), web Playground (provider-grouped selector with· 128kcontext chips), and GitHub Action (Limits:line in the sticky PR comment).Public API (
getRate,getModel,KNOWN_MODELS,RateEntry,ModelDescriptor) is byte-compatible — no consumer code outside the touched files needs to change.Why a hybrid registry, not a full swap?
claude-haiku-4-5/claude-opus-4-7/claude-sonnet-4-6are not yet in tokenlens upstream (its anthropic catalog tops atclaude-opus-4-1-20250805). They stay in a smallLOCAL_OVERRIDEStable that wins on id collision. A weekly cron (.github/workflows/registry-check.yml) runsscripts/check-overrides.mjs:packages/core/src/__snapshots__/registry.json→ fail on drift, prompt to regenerate the snapshot.On failure the workflow opens (or comments on) a tracking issue.
Other changes
benchmarks/run.mjsgains--filter/--models(also honorsBENCH_MODELSenv) so the regenerate sweep can be scoped — the matrix grew from 7×N×M to 42×N×M cells.benchmarks/results.jsonregenerated against the full sweep.>=15) so a broken registry can't silently pass.packages/coreadds two npm scripts:snapshot:registry,check:overrides.Test plan
npm run build(all packages compile)npm run lint(biome clean)npm run typechecknpx vitest run— 48 tests pass (was 46; added 2 forrenderModelLimits)npm run benchmarks— drift check passes against regeneratedresults.jsonnpm run check:overrides -w @tokenometer/core→OK — 42 models, 3 overrides intact, snapshot in sync.echo "..." | node packages/cli/dist/index.js - --model claude-opus-4-7,gpt-4oproduces the expected table + newLimits:block + summaryregistry-check.yml(Mondays 12:00 UTC) — verify viaworkflow_dispatchafter mergeOpen follow-ups (out of scope)
models.devprovider data (~33KB raw across the 3 providers). For an even smaller browser footprint,tokenlensships an asyncfetchModels()path that could be opted into forpackages/webonly.KNOWN_MODELS) reuses the existingo200k_baseopenai dispatch intokenize.ts. Those older models are technicallycl100k_base. Pre-existing bug surfaced by the wider catalog; worth a dedicated fix.🤖 Generated with Claude Code