tokenometer

Empirical token-cost + latency benchmarking for LLM prompts. Tells you what your prompt actually costs and how fast each provider responds across Claude, GPT-4o, Gemini, Mistral, and Cohere — in every format.

See the root README for findings, methodology, and the full project overview.

Live playground: tokenometer.dev · Source · MIT

npx tokenometer ./prompt.md --model claude-opus-4-7,gpt-4o

model            format    tokens  est. cost  tokenizer
---------------  --------  ------  ---------  --------------
claude-opus-4-7  json         ~78  $0.001170  cl100k_base
claude-opus-4-7  yaml         ~84  $0.001260  cl100k_base
gpt-4o           json          77  $0.000192  o200k_base
gpt-4o           yaml          83  $0.000208  o200k_base

Cheapest: gpt-4o as json ($0.000192)
Priciest: claude-opus-4-7 as yaml ($0.001260, 6.74x more)

A leading ~ marks an approximate count (offline mode for Claude / Gemini / Mistral-Tekken / Cohere, since none of those vendors publishes a public production tokenizer that ships in JS).

Flags

Flag	Default	Notes
`--model <id[,id…]>`	`claude-opus-4-7` (or auto-detected)	Any registered model id (63 across 5 providers).
`--format <fmt[,fmt…]>`	`json,yaml,xml,markdown,text`	Subset of supported formats.
`--output <fmt>`	`table`	`table` \| `json` \| `sarif`.
`--by-file`	off	Append a per-file token/USD table (multi-file only).
`--image <path>`	none	Add vision-token cost for the image (repeatable).
`--config <path>`	none	Load this exact config file (skips walk-up).
`--no-config`	off	Skip `.tokenometer.yml` loading entirely.
`--empirical`	off	Use provider `countTokens` APIs (free, exact).
`--latency`	off	Measure real generation latency (TTFT, total ms, tokens/sec). Implies `--empirical`.
`--latency-trials <n>`	`3`	Trials per cell when `--latency` is set (1–10).
`--max-spend <usd>`	`0.05` (or `0.25` with `--latency`)	Hard ceiling for empirical / latency mode.
`--offline`	off	Force offline path (overrides `--empirical`).
`-h`, `--help`		Print help.
`-v`, `--version`		Print version.

tokenometer <file> [options]
echo "prompt" | tokenometer - [options]

Models supported

63 models across 5 providers. Run tokenometer --help for the full list at runtime, or browse the Cost Atlas for sortable per-model pages.

Provider	Examples	Offline tokenizer	Empirical
Anthropic	`claude-opus-4-7`, `claude-sonnet-4-6`, `claude-haiku-4-5`, Claude 3.x family	`gpt-tokenizer` `cl100k_base` (approximate)	`messages.countTokens` (free, exact)
OpenAI	`gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo`, `gpt-3.5-turbo`, `o1` family	`gpt-tokenizer` `o200k_base` (exact)	same `o200k_base` (matches production)
Google	`gemini-2.5-pro`, `gemini-2.5-flash`, `gemini-1.5-pro`, `gemini-1.5-flash`	`chars / 4` (approximate)	`model.countTokens` (free, exact)
Mistral (19 models)	`open-mistral-7b`, `open-mixtral-8x22b`, `mistral-large-latest`, `codestral-latest`, `mistral-nemo`, `pixtral-large-latest`, `mistral-medium-2505`, `magistral-small`, `ministral-3b-latest`, `devstral-small-2505`	`mistral-tokenizer-js` for SentencePiece V1/V2/V3 (exact); `chars/4` for Tekken (approximate)	unsupported (no public token-count API)
Cohere	`command-r-08-2024`, `command-r-plus-08-2024`	`chars / 4` (approximate)	`POST /v1/tokenize` (free, exact, requires `COHERE_API_KEY`)

Pricing comes from the tokenlens registry with a small set of local overrides for bleeding-edge models. Cohere pricing lives entirely in LOCAL_OVERRIDES because @tokenlens/models doesn't yet ship a Cohere catalog at v1.3.0.

Empirical mode

For exact, vendor-billed counts on Claude, Gemini, and Cohere, set the right env var and pass --empirical. The tool calls each provider's free countTokens-equivalent endpoint — no charge.

ANTHROPIC_API_KEY=… GOOGLE_API_KEY=… COHERE_API_KEY=… \
  npx tokenometer ./prompt.md --empirical --model claude-opus-4-7,gemini-2.5-pro,command-r-plus-08-2024

OpenAI's empirical path uses tiktoken o200k_base locally — that encoding matches OpenAI's production count exactly, so no API call is needed. Mistral has no public token-count endpoint; the offline mistral-tokenizer-js path is used regardless.

Auto provider detection

When --model is omitted, tokenometer picks a default based on which provider key is set in your environment:

ANTHROPIC_API_KEY only → claude-opus-4-7
OPENAI_API_KEY only → gpt-4o
GOOGLE_API_KEY / GEMINI_API_KEY only → first known gemini-* model (falls back to gemini-2.5-pro)
MISTRAL_API_KEY only → first known mistral-* model
COHERE_API_KEY only → command-r-plus-08-2024
Multiple keys set → falls back to claude-opus-4-7 and prints a stderr note. Pass --model to disambiguate.
No keys set → existing default (claude-opus-4-7).

This means npx tokenometer prompt.md does the right thing in any of those environments without you having to remember model names.

`.tokenometer.yml` config

Drop a .tokenometer.yml (or .yaml) at the project root and tokenometer will pick it up automatically (walks up from the cwd, stopping at .git):

models: [claude-opus-4-7, gpt-4o, mistral-large-latest]
formats: [json, yaml, markdown]
paths: [prompts/**/*.md]
budgets:
  total: 0.50
  per-file: 0.10

User-passed CLI flags always win over config defaults. Use --config <path> to load an explicit file (skips the walk-up). Use --no-config to skip config loading entirely.

Output formats

The --output flag picks the display format (separate from --format, which controls how the prompt body is converted before tokenization):

--output table (default) — the human-readable per-cell table you've been seeing.
--output json — emits a TokenometerResult JSON shape: { files: [{ path, results: [...] }] }. One entry per input file. Pipe to jq for filtering.
--output sarif — emits SARIF 2.1.0 with one result per (file, model, format) cell. Drop the file into GitHub Code Scanning or any SARIF viewer.

npx tokenometer ./prompt.md --output sarif > tokenometer.sarif
npx tokenometer ./prompt.md --output json | jq '.files[].results | map(.inputCost) | add'

Latency

--latency measures real generation latency in addition to token cost. For each (model, format) cell, tokenometer streams n real chat completions (default n=3, override with --latency-trials 1..10) capped at max_tokens=200, and reports:

TTFT — time to first streamed token (ms)
Total — wall-clock from request start to stream end (ms)
tokens/sec — output_tokens / (total - ttft)

Numbers are reported as p50 / p95 / mean over the trials. Full per-trial data is included in --output json.

ANTHROPIC_API_KEY=… OPENAI_API_KEY=… \
  npx tokenometer ./prompt.md --latency --model claude-opus-4-7,gpt-4o

--latency implies --empirical (offline mode can't measure real latency). The default --max-spend ceiling is bumped from $0.05 to $0.25 to cover the n × 200-token generations; pass --max-spend explicitly to override.

Supported providers: Anthropic (messages.stream), OpenAI (/v1/chat/completions SSE), Google (generateContentStream), Cohere (/v1/chat NDJSON), Mistral (/v1/chat/completions SSE). Each trial retries once on transient failures.

Per-file attribution

--by-file appends a per-file token + USD summary table when you pass multiple input files (single-file inputs are a no-op):

By file:
  File              Tokens   USD
  ────────────────  ───────  ───────
  prompts/agent.md  1,243    $0.0186
  prompts/router.md   872    $0.0131

Useful for figuring out which prompt files dominate the cost of a multi-file pipeline. The aggregator that produces this table is also what powers the GitHub Action's per-file Δ comment, and is unit-tested in packages/action.

Vision tokens

Pass --image <path> (repeatable) to factor image-based vision tokens into the cost estimate alongside your prompt text:

npx tokenometer ./prompt.md --image ./screenshot.png --image ./diagram.jpg

Each image's dimensions are read with image-size (no native deps), then dispatched to the provider-specific vision-token estimator:

Claude → Anthropic's (width × height) / 750, capped at 1600 tokens.
GPT-4o → OpenAI's high-detail tiling: 85 + 170 × ceil(w/512) × ceil(h/512) after the 2048/768 resize step.
Gemini → Google's 258 × ceil(w/768) × ceil(h/768) (with a flat 258 for ≤384×384 images).

Mistral and Cohere don't have published vision-token formulas, so vision images are skipped for those providers (with a stderr note). Vision-token cells are always marked approximate: true since they're formula-derived. Each image also gets its own row in the --by-file table as a virtual file <image-path> [vision].

Why not just `tiktoken`?

tiktoken's cl100k_base (the encoding most "Claude tokenizer" libraries fall back on) under-counts Opus 4.7 by a median of +62% across a 10-prompt benchmark. Sonnet 4.6 and Haiku 4.5 are closer (~17%). Format choice is a wash. Model choice swings cost by 12×. See README for the dataset findings.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tokenometer

Flags

Models supported

Empirical mode

Auto provider detection

`.tokenometer.yml` config

Output formats

Latency

Per-file attribution

Vision tokens

Why not just `tiktoken`?

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

tokenometer

Flags

Models supported

Empirical mode

Auto provider detection

.tokenometer.yml config

Output formats

Latency

Per-file attribution

Vision tokens

Why not just tiktoken?

License

`.tokenometer.yml` config

Why not just `tiktoken`?