Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,17 @@ runaway costs.
- Prefer features that reduce tokens structurally (output caps, compaction) over
features that merely expose knobs for users to tune manually.

## Model catalog (`src/models/catalog.ts`)
- A curated, offline list of coding models with pricing, context window, and a
relative coding score. It drives USD cost estimates and priority-based model
selection (`performance` / `cost` / `balanced`).
- Keep it current: when adding/repricing a model, update its entry **and**
`CATALOG_AS_OF`. Anthropic pricing comes from the bundled claude-api reference;
verify Gemini pricing against Google's published rates. Don't guess prices.
- `priority` defaults to `performance`, which preserves the historical default
models (Opus for Anthropic, Gemini 2.5 Pro for Gemini). Don't change the
default without updating the config tests that assert those ids.

## Boundaries
- No business logic. This is a general-purpose tool.
- Don't add a second state paradigm or heavy dependencies without a clear reason.
Expand Down
34 changes: 34 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ In the REPL: type a request, watch it work. Mutating actions (writes, edits,
shell commands) prompt for approval unless pre-approved in config.

- `/help` — list commands
- `/models` — show known models, pricing, and the active one (see below)
- `/improve` — reflect on the session and propose an improvement PR (see below)
- `/<name> [args]` — run a custom command (see below)
- `/exit` — quit
Expand Down Expand Up @@ -84,6 +85,7 @@ CLI flags.
{
"provider": "anthropic",
"model": "claude-opus-4-8",
"priority": "performance",
"maxTokens": 16000,
"thinking": true,
"effort": "high",
Expand Down Expand Up @@ -128,6 +130,38 @@ the savings where it can:
(see `TODO.md`), which will keep input-token counts from compounding across
many turns without any user action.

## Model awareness & cost control

tiny-code ships a small, curated catalog of coding models
(`src/models/catalog.ts`) with each model's pricing, context window, and a
relative coding-aptitude score. It uses this to turn raw token counts into real
money and to pick a model that fits your cost/performance preference.

- **Dollar cost, not just tokens.** Per-turn usage and the session total show an
estimated USD cost next to the token counts, priced from the active model's
rate — so the bill is visible as you work, not a surprise later.
- **`/models`** lists the catalog (cheapest first) with pricing and scores,
marks the active model, and shows the session's running cost.
- **Priority-driven selection.** When you don't pin a `model`, tiny-code picks
one for you based on `priority`:

| `priority` | Picks |
| --------------- | ----------------------------------------------------------- |
| `performance` | The most capable model (the default — current behavior). |
| `cost` | The cheapest still-capable model. |
| `balanced` | The best capability-per-dollar among capable models. |

```json
{ "priority": "balanced" }
```

Or per-session with `TINY_CODE_PRIORITY=cost`. Pinning `model` (config, env,
or `--model`) always overrides the recommendation.

The catalog is curated and offline (tiny-code has no live model-discovery yet —
see `TODO.md`), so its prices carry an "as of" date; keep it current as vendors
ship new models and change pricing.

## Self-improvement

tiny-code can learn from how it's used. When a session ends (or when you run
Expand Down
9 changes: 9 additions & 0 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,15 @@ Save/restore `AgentLoop.getMessages()` to disk; `--resume` to continue a session
Pair with the compaction feature above so resumed sessions don't carry a bloated
history.

## Live model catalog refresh
The model catalog (`src/models/catalog.ts`) is curated and offline, so its
pricing and model list drift until a human updates them. **Approach:** an opt-in
refresh that pulls current models/pricing from the provider APIs (Anthropic's
`GET /v1/models` for capabilities; a pricing source for rates) and Gemini's
equivalent, caching to disk with the `CATALOG_AS_OF` date. Gate behind a flag so
the default stays offline and deterministic. Pairs with the existing
priority-based selection — fresher data, same `recommendModel` logic.

## ripgrep-backed grep
The `grep` tool currently walks the tree in JS. **Approach:** detect `rg` on
PATH and shell out for speed + .gitignore awareness, falling back to the JS
Expand Down
6 changes: 4 additions & 2 deletions src/cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,10 @@ Options:
-h, --help Show this help

Environment:
ANTHROPIC_API_KEY Required for the Anthropic provider
GEMINI_API_KEY Required for the Gemini provider
ANTHROPIC_API_KEY Required for the Anthropic provider
GEMINI_API_KEY Required for the Gemini provider
TINY_CODE_PRIORITY performance | cost | balanced — auto-picks a model when
none is pinned (default: performance)
`;

function main(): void {
Expand Down
18 changes: 17 additions & 1 deletion src/config/load.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,12 @@ import { readFileSync, existsSync } from 'node:fs';
import { homedir } from 'node:os';
import { join } from 'node:path';
import { z } from 'zod';
import type { Priority } from '../models/catalog.js';
import { recommendModel } from '../models/catalog.js';

export type Provider = 'anthropic' | 'gemini';
export type Effort = 'low' | 'medium' | 'high' | 'xhigh' | 'max';
export type { Priority } from '../models/catalog.js';

/** Auto-approval rules that bypass the interactive permission prompt. */
export interface AllowRules {
Expand All @@ -19,6 +22,8 @@ export interface AllowRules {
export interface ResolvedConfig {
provider: Provider;
model: string;
/** Cost/performance bias used to auto-pick a model when none is pinned. */
priority: Priority;
anthropicApiKey: string | undefined;
geminiApiKey: string | undefined;
maxTokens: number;
Expand Down Expand Up @@ -55,6 +60,7 @@ const FileConfigSchema = z
.object({
provider: z.enum(['anthropic', 'gemini']).optional(),
model: z.string().optional(),
priority: z.enum(['performance', 'cost', 'balanced']).optional(),
maxTokens: z.number().int().positive().optional(),
thinking: z.boolean().optional(),
effort: z.enum(['low', 'medium', 'high', 'xhigh', 'max']).optional(),
Expand Down Expand Up @@ -107,8 +113,17 @@ export function loadConfig(overrides: CliOverrides = {}, cwd: string = process.c
file.provider ??
(anthropicApiKey ? 'anthropic' : geminiApiKey ? 'gemini' : 'anthropic');

const priority: Priority =
(env.TINY_CODE_PRIORITY as Priority | undefined) ?? file.priority ?? 'performance';

// When the user pins a model, honor it. Otherwise let the catalog pick the
// best fit for the cost/performance priority, falling back to a static
// default if the catalog has no entry for the provider.
const pinnedModel = overrides.model ?? env.TINY_CODE_MODEL ?? file.model;
const model =
overrides.model ?? env.TINY_CODE_MODEL ?? file.model ?? DEFAULT_MODELS[provider];
pinnedModel ??
recommendModel({ provider, priority })?.id ??
DEFAULT_MODELS[provider];

const maxTokens = env.TINY_CODE_MAX_TOKENS
? Number(env.TINY_CODE_MAX_TOKENS)
Expand All @@ -124,6 +139,7 @@ export function loadConfig(overrides: CliOverrides = {}, cwd: string = process.c
return {
provider,
model,
priority,
anthropicApiKey,
geminiApiKey,
maxTokens,
Expand Down
13 changes: 12 additions & 1 deletion src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,18 @@ export { PermissionGate } from './permissions/gate.js';
export type { PermissionPrompt, PermissionRequest, PermissionChoice } from './permissions/gate.js';

export { loadConfig } from './config/load.js';
export type { ResolvedConfig, CliOverrides, Provider, Effort, AllowRules } from './config/load.js';
export type { ResolvedConfig, CliOverrides, Provider, Effort, Priority, AllowRules } from './config/load.js';

export {
MODEL_CATALOG,
CATALOG_AS_OF,
getModelInfo,
estimateCostUsd,
formatUsd,
blendedCostPerMTok,
recommendModel,
} from './models/catalog.js';
export type { ModelInfo, RecommendOptions } from './models/catalog.js';
export { loadProjectContext } from './config/context.js';

export { loadCommands, renderCommand } from './commands/loader.js';
Expand Down
132 changes: 132 additions & 0 deletions src/models/catalog.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
import type { Provider } from '../config/load.js';
import type { Usage } from '../providers/types.js';

/**
* How to weigh cost vs. capability when auto-selecting a model.
* - `performance`: most capable model (maximize quality, ignore price)
* - `cost`: cheapest capable model (maximize savings)
* - `balanced`: best capability-per-dollar among genuinely capable models
*/
export type Priority = 'performance' | 'cost' | 'balanced';

/**
* Curated facts about a coding model. Pricing is USD per 1,000,000 tokens for
* the standard (non-cached, ≤200K-context) tier — the common case for an
* interactive coding session. `codingScore` is a curated 0–100 estimate of
* relative aptitude on coding/agentic tasks, used only to rank models against
* each other; it is not a vendor benchmark.
*/
export interface ModelInfo {
id: string;
provider: Provider;
label: string;
inputPricePerMTok: number;
outputPricePerMTok: number;
contextWindow: number;
codingScore: number;
}

/**
* The date this catalog's pricing and model list were last verified. Models and
* prices move; keep this current when updating entries. Anthropic figures come
* from the bundled claude-api reference; Gemini figures from Google's published
* API pricing.
*/
export const CATALOG_AS_OF = '2026-06-08';

/**
* The known coding models, newest/most-capable first within each provider.
* Keep this list curated — tiny-code runs offline-first, so it can't discover
* models at runtime. Update it (and {@link CATALOG_AS_OF}) as vendors ship.
*/
export const MODEL_CATALOG: ModelInfo[] = [
// Anthropic — pricing per the claude-api model table.
{ id: 'claude-opus-4-8', provider: 'anthropic', label: 'Claude Opus 4.8', inputPricePerMTok: 5, outputPricePerMTok: 25, contextWindow: 1_000_000, codingScore: 99 },
{ id: 'claude-opus-4-7', provider: 'anthropic', label: 'Claude Opus 4.7', inputPricePerMTok: 5, outputPricePerMTok: 25, contextWindow: 1_000_000, codingScore: 96 },
{ id: 'claude-opus-4-6', provider: 'anthropic', label: 'Claude Opus 4.6', inputPricePerMTok: 5, outputPricePerMTok: 25, contextWindow: 1_000_000, codingScore: 93 },
{ id: 'claude-sonnet-4-6', provider: 'anthropic', label: 'Claude Sonnet 4.6', inputPricePerMTok: 3, outputPricePerMTok: 15, contextWindow: 1_000_000, codingScore: 88 },
{ id: 'claude-haiku-4-5', provider: 'anthropic', label: 'Claude Haiku 4.5', inputPricePerMTok: 1, outputPricePerMTok: 5, contextWindow: 200_000, codingScore: 75 },

// Gemini — standard-tier pricing (prompts ≤200K tokens) from Google AI pricing.
{ id: 'gemini-2.5-pro', provider: 'gemini', label: 'Gemini 2.5 Pro', inputPricePerMTok: 1.25, outputPricePerMTok: 10, contextWindow: 1_048_576, codingScore: 90 },
{ id: 'gemini-2.5-flash', provider: 'gemini', label: 'Gemini 2.5 Flash', inputPricePerMTok: 0.3, outputPricePerMTok: 2.5, contextWindow: 1_048_576, codingScore: 72 },
{ id: 'gemini-2.5-flash-lite', provider: 'gemini', label: 'Gemini 2.5 Flash-Lite', inputPricePerMTok: 0.1, outputPricePerMTok: 0.4, contextWindow: 1_048_576, codingScore: 55 },
];

/** Look up catalog facts for a model id, or `undefined` if it's not tracked. */
export function getModelInfo(id: string): ModelInfo | undefined {
return MODEL_CATALOG.find((m) => m.id === id);
}

/** Estimate the USD cost of a token usage given a model's pricing. */
export function estimateCostUsd(usage: Usage, info: ModelInfo): number {
return (
(usage.inputTokens / 1_000_000) * info.inputPricePerMTok +
(usage.outputTokens / 1_000_000) * info.outputPricePerMTok
);
}

/** Format a USD amount with precision that stays readable for tiny costs. */
export function formatUsd(amount: number): string {
return `$${amount.toFixed(amount < 1 ? 4 : 2)}`;
}

/**
* Coding sessions are input-heavy (history is resent every turn), so blend
* pricing 80% input / 20% output to compare models on a single cost number.
*/
export function blendedCostPerMTok(info: ModelInfo): number {
return info.inputPricePerMTok * 0.8 + info.outputPricePerMTok * 0.2;
}

/**
* Minimum coding aptitude to consider, per priority. Keeps `balanced`/`cost`
* from collapsing onto the cheapest-but-weakest model — score-per-dollar always
* favors the cheapest, so a capability floor is what makes the tradeoff useful.
*/
const DEFAULT_MIN_SCORE: Record<Priority, number> = {
performance: 0,
balanced: 80,
cost: 60,
};

export interface RecommendOptions {
provider: Provider;
priority: Priority;
/** Reject models below this coding score. Defaults per-priority. */
minCodingScore?: number;
/** Reject models whose context window is smaller than this. */
minContextWindow?: number;
}

/**
* Pick the model that best fits a cost/performance priority. Returns the single
* best candidate, or `undefined` if the constraints exclude every model for the
* provider (callers should fall back to a static default).
*/
export function recommendModel(opts: RecommendOptions): ModelInfo | undefined {
const minScore = opts.minCodingScore ?? DEFAULT_MIN_SCORE[opts.priority];
const candidates = MODEL_CATALOG.filter(
(m) =>
m.provider === opts.provider &&
m.codingScore >= minScore &&
(opts.minContextWindow === undefined || m.contextWindow >= opts.minContextWindow),
);
if (candidates.length === 0) return undefined;

const score = (m: ModelInfo): number => {
switch (opts.priority) {
case 'performance':
// Highest aptitude; break ties toward the cheaper option.
return m.codingScore - blendedCostPerMTok(m) / 1000;
case 'cost':
// Cheapest; break ties toward the more capable option.
return -blendedCostPerMTok(m) + m.codingScore / 1000;
case 'balanced':
// Best capability per dollar.
return m.codingScore / blendedCostPerMTok(m);
}
};

return candidates.reduce((best, m) => (score(m) > score(best) ? m : best));
}
Loading
Loading