From 76d3858ac9366a9eea353f714354ab88ac5f2bd7 Mon Sep 17 00:00:00 2001 From: Faraazuddin Mohammed Date: Sat, 9 May 2026 00:08:08 -0400 Subject: [PATCH] docs(readme): align usage with shipped CLI + Action - replace `--format md` with real format names (markdown, text) - drop stale `gemini-2.5-pro` example; add stdin example and `--help` pointer - correct empirical-mode description: real provider countTokens, no caching claim - fix CI snippet to match @v0 action inputs (models/formats/budget) - methodology table: empirical column reflects shipped countTokens dispatch - note pricing now comes from tokenlens registry (post #8 refactor) Co-Authored-By: Claude Opus 4.7 (1M context) --- README.md | 52 +++++++++++++++++++++++++++++++++++----------------- 1 file changed, 35 insertions(+), 17 deletions(-) diff --git a/README.md b/README.md index 6eb990c..cbe7f6a 100644 --- a/README.md +++ b/README.md @@ -28,17 +28,27 @@ Tokenometer is dev-time, multi-provider, multi-format, optionally empirical, and ## Install +One-shot: + ```bash npx tokenometer ./prompt.md --model claude-opus-4-7 ``` -Or: +Global: ```bash npm i -g tokenometer -tokenometer ./prompt.md --format yaml,json,xml,md --model claude-opus-4-7,gpt-4o,gemini-2.5-pro +tokenometer ./prompt.md --format yaml,json,xml,markdown,text --model claude-opus-4-7,gpt-4o +``` + +Stdin works too: + +```bash +echo "prompt body" | tokenometer - --model claude-sonnet-4-6 ``` +Run `tokenometer --help` for the full flag list and the current set of known model ids. + ## Three-line use ### 1. Compare formats for a single prompt (offline, no API key) @@ -47,38 +57,46 @@ tokenometer ./prompt.md --format yaml,json,xml,md --model claude-opus-4-7,gpt-4o tokenometer ./prompt.md --model claude-opus-4-7 ``` -Prints estimated cost across all formats × the chosen model. +Prints estimated tokens + USD across each format × the chosen model(s). Default model is `claude-opus-4-7`; default formats are all of `json,markdown,text,xml,yaml`. -### 2. Empirical mode (real API calls, cache-aware, with a hard ceiling) +### 2. Empirical mode (real provider `countTokens`, with a hard ceiling) ```bash ANTHROPIC_API_KEY=… tokenometer ./prompt.md --empirical --max-spend 0.05 ``` -Sends one minimal call per (provider × format), records `usage.input_tokens` and `cache_read_input_tokens`, prints empirical $ next to estimated. +For each `(model × format)` cell, calls the provider's exact token-count API: + +- Anthropic → `messages.countTokens` (free) +- Google → `model.countTokens` (free) +- OpenAI → tiktoken `o200k_base` (matches OpenAI's production count exactly, no API call) + +Set `GOOGLE_API_KEY` (or `GEMINI_API_KEY`) for Gemini models. `--offline` forces the offline path even if `--empirical` is also passed. -### 3. CI guardrail +### 3. CI guardrail (GitHub Action) ```yaml -- uses: faraa2m/tokenometer-action@v1 +- uses: faraa2m/tokenometer@v0 with: - paths: prompts/**/*.md - budget: 0.50 + paths: prompts/**/*.md,prompts/**/*.json + models: claude-opus-4-7,claude-sonnet-4-6,gpt-4o + formats: json,yaml,markdown + budget: '0.50' # USD; omit to disable the gate ``` -Posts a sticky PR comment with the cost diff vs the base branch. Fails the check if the diff exceeds `budget`. +Posts a sticky PR comment with the cost diff vs the base branch. Fails the check when the total Δ exceeds `budget`. See [`packages/action/README.md`](packages/action/README.md) for all inputs and outputs. ## Methodology -Tokenometer chooses a tokenizer per provider and tells you when the count is approximate (rendered with a leading `~` and `approximate: true` in the API): +Tokenometer picks a tokenizer per provider and flags the count as approximate (`approximate: true` in the API result) when the offline path is a proxy: -| Provider | Offline tokenizer | Exactness | Notes | -|-----------|------------------------------|-------------|-----------------------------------------------------------------------| -| OpenAI | `gpt-tokenizer` `o200k_base` | exact | Same encoding GPT-4o / 4o-mini use in production. | -| Anthropic | `gpt-tokenizer` `cl100k_base`| approximate | Anthropic does not publish a Claude 3+ tokenizer. cl100k is a close proxy; empirical mode (planned) will call Anthropic for the real number. | -| Google | `chars / 4` heuristic | approximate | Gemini token counting is API-only. Empirical mode (planned) will call `countTokens` for the real number. | +| Provider | Offline tokenizer | Exactness | Empirical (`--empirical`) | +|-----------|------------------------------|-------------|----------------------------------| +| OpenAI | `gpt-tokenizer` `o200k_base` | exact | same `o200k_base` (matches OpenAI production count) | +| Anthropic | `gpt-tokenizer` `cl100k_base`| approximate | `messages.countTokens` (exact, free) | +| Google | `chars / 4` heuristic | approximate | `model.countTokens` (exact, free) | -Cost = `tokens / 1000 × per-1k input rate`. Rate table is versioned (`RATES_VERSION`) and lives in [`packages/core/src/rates.ts`](packages/core/src/rates.ts). +Cost = `tokens / 1000 × per-1k input rate`. Pricing and context windows are sourced from the [`tokenlens`](https://www.npmjs.com/package/tokenlens) registry, with a small set of local overrides for bleeding-edge models the registry hasn't picked up yet — see [`packages/core/src/rates.ts`](packages/core/src/rates.ts) (`RATES_VERSION`). ## Status