Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions packages/cli/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# tokenometer

> Empirical token-cost benchmarking for LLM prompts. Tells you what your prompt actually costs across Claude, GPT-4o, and Gemini, in every format.

[**Live playground: tokenometer.vercel.app**](https://tokenometer.vercel.app) · [Source](https://github.com/faraa2m/tokenometer) · MIT

```bash
npx tokenometer ./prompt.md --model claude-opus-4-7,gpt-4o
```

```
model format tokens est. cost tokenizer
--------------- -------- ------ --------- --------------
claude-opus-4-7 json ~78 $0.001170 cl100k_base
claude-opus-4-7 yaml ~84 $0.001260 cl100k_base
gpt-4o json 77 $0.000192 o200k_base
gpt-4o yaml 83 $0.000208 o200k_base

Cheapest: gpt-4o as json ($0.000192)
Priciest: claude-opus-4-7 as yaml ($0.001260, 6.74x more)
```

A leading `~` marks an approximate count (offline mode for Claude / Gemini, since neither vendor publishes a public tokenizer).

## Empirical mode

For exact, vendor-billed counts on Claude and Gemini, set the right env var and pass `--empirical`. The tool calls the providers' free `countTokens` endpoints — no charge.

```bash
ANTHROPIC_API_KEY=… GOOGLE_API_KEY=… \
npx tokenometer ./prompt.md --empirical
```

## Why not just `tiktoken`?

`tiktoken`'s `cl100k_base` (the encoding most "Claude tokenizer" libraries fall back on) **under-counts Opus 4.7 by a median of +62%** across a 10-prompt benchmark. Sonnet 4.6 and Haiku 4.5 are closer (~17%). Format choice is a wash. Model choice swings cost by 12×. See [README](https://github.com/faraa2m/tokenometer#findings-anthropic-n150-cells-across-10-prompt-shapes) for the dataset findings.

## Flags

```
tokenometer <file> [options]
echo "prompt" | tokenometer - [options]

--model <id[,id…]> Default: claude-opus-4-7
--format <fmt[,fmt…]> Default: all (json,yaml,xml,markdown,text)
--empirical Use provider countTokens APIs (free, exact)
--max-spend <usd> Hard ceiling for empirical mode (default 0.05)
--offline Force offline (overrides --empirical)
-h, --help
-v, --version
```

## License

MIT
44 changes: 30 additions & 14 deletions packages/cli/package.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,32 @@
{
"name": "tokenometer",
"version": "0.0.1",
"description": "Empirical token-cost benchmarking CLI for LLM prompts.",
"description": "Empirical token-cost benchmarking CLI for LLM prompts. Tells you what your prompt actually costs across Claude, GPT-4o, and Gemini, in every format.",
"license": "MIT",
"author": "Faraazuddin Mohammed <mohdfaraaz1@gmail.com>",
"homepage": "https://tokenometer.vercel.app",
"repository": {
"type": "git",
"url": "git+https://github.com/faraa2m/tokenometer.git",
"directory": "packages/cli"
},
"bugs": {
"url": "https://github.com/faraa2m/tokenometer/issues"
},
"keywords": [
"ai",
"anthropic",
"claude",
"cli",
"cost",
"gemini",
"gpt",
"llm",
"openai",
"prompt",
"token",
"tokenizer"
],
"type": "module",
"main": "./dist/index.js",
"types": "./dist/index.d.ts",
Expand All @@ -16,6 +40,10 @@
}
},
"files": ["dist", "README.md"],
"publishConfig": {
"access": "public",
"registry": "https://registry.npmjs.org/"
},
"scripts": {
"build": "tsc -b",
"clean": "rm -rf dist"
Expand All @@ -27,17 +55,5 @@
"@types/node": "^22.10.5",
"typescript": "^5.7.2",
"vitest": "^3.0.0"
},
"keywords": [
"ai",
"anthropic",
"claude",
"cost",
"gpt",
"llm",
"openai",
"prompt",
"token",
"tokenizer"
]
}
}
66 changes: 66 additions & 0 deletions packages/core/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# @tokenometer/core

> Core library powering [tokenometer](https://www.npmjs.com/package/tokenometer): tokenizer dispatch, format converters, versioned cost rate matrix, and an empirical-mode `countTokens` adapter for Anthropic, OpenAI, and Google.

[**Live playground**](https://tokenometer.vercel.app) · [Source](https://github.com/faraa2m/tokenometer) · MIT

If you just want a CLI, `npm install -g tokenometer`. This package is for programmatic use.

## API

```ts
import {
tokenize,
tokenizeMatrix,
tokenizeEmpirical,
tokenizeMatrixEmpirical,
countTokens,
toFormat,
isFormat,
allFormats,
KNOWN_MODELS,
RATES,
RATES_VERSION,
getModel,
getRate,
} from '@tokenometer/core';
```

### Offline (deterministic, no API key)

```ts
const result = tokenize({
prompt: '{"hello": "world"}',
format: 'yaml',
modelId: 'claude-opus-4-7',
});
// {
// model: 'claude-opus-4-7',
// provider: 'anthropic',
// format: 'yaml',
// tokenizer: 'cl100k_base',
// inputTokens: 12,
// inputCost: 0.00018,
// approximate: true // ← Anthropic does not publish a public Claude 3+ tokenizer
// }
```

### Empirical (real provider counts, free)

```ts
const result = await tokenizeEmpirical({
prompt: '{"hello": "world"}',
format: 'yaml',
modelId: 'claude-opus-4-7',
env: { anthropicApiKey: process.env.ANTHROPIC_API_KEY! },
});
// approximate: false ← uses Anthropic's messages.countTokens
```

### Rate table

`RATES` is a `Record<modelId, { inputPer1k, outputPer1k, cachedInputPer1k? }>`. `RATES_VERSION` ships as a date string so consumers can pin or audit.

## License

MIT
31 changes: 30 additions & 1 deletion packages/core/package.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,33 @@
{
"name": "@tokenometer/core",
"version": "0.0.1",
"description": "Core: tokenizer dispatch, format conversion, cost rate matrix.",
"description": "Empirical token-cost benchmarking for LLM prompts — core library (tokenizers, format converters, rate matrix, empirical countTokens dispatch).",
"license": "MIT",
"author": "Faraazuddin Mohammed <mohdfaraaz1@gmail.com>",
"homepage": "https://tokenometer.vercel.app",
"repository": {
"type": "git",
"url": "git+https://github.com/faraa2m/tokenometer.git",
"directory": "packages/core"
},
"bugs": {
"url": "https://github.com/faraa2m/tokenometer/issues"
},
"keywords": [
"ai",
"anthropic",
"claude",
"cl100k",
"cost",
"gemini",
"gpt",
"llm",
"o200k",
"openai",
"prompt",
"token",
"tokenizer"
],
"type": "module",
"main": "./dist/index.js",
"types": "./dist/index.d.ts",
Expand All @@ -13,6 +38,10 @@
}
},
"files": ["dist", "README.md"],
"publishConfig": {
"access": "public",
"registry": "https://registry.npmjs.org/"
},
"scripts": {
"build": "tsc -b",
"clean": "rm -rf dist"
Expand Down
Loading