rili-live · zizzle6717 · Jun 8, 2026 · Jun 8, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -22,6 +22,17 @@ runaway costs.
 - Prefer features that reduce tokens structurally (output caps, compaction) over
   features that merely expose knobs for users to tune manually.
 
+## Model catalog (`src/models/catalog.ts`)
+- A curated, offline list of coding models with pricing, context window, and a
+  relative coding score. It drives USD cost estimates and priority-based model
+  selection (`performance` / `cost` / `balanced`).
+- Keep it current: when adding/repricing a model, update its entry **and**
+  `CATALOG_AS_OF`. Anthropic pricing comes from the bundled claude-api reference;
+  verify Gemini pricing against Google's published rates. Don't guess prices.
+- `priority` defaults to `performance`, which preserves the historical default
+  models (Opus for Anthropic, Gemini 2.5 Pro for Gemini). Don't change the
+  default without updating the config tests that assert those ids.
+
 ## Boundaries
 - No business logic. This is a general-purpose tool.
 - Don't add a second state paradigm or heavy dependencies without a clear reason.

diff --git a/README.md b/README.md
@@ -45,6 +45,7 @@ In the REPL: type a request, watch it work. Mutating actions (writes, edits,
 shell commands) prompt for approval unless pre-approved in config.
 
 - `/help` — list commands
+- `/models` — show known models, pricing, and the active one (see below)
 - `/improve` — reflect on the session and propose an improvement PR (see below)
 - `/<name> [args]` — run a custom command (see below)
 - `/exit` — quit
@@ -84,6 +85,7 @@ CLI flags.
 {
   "provider": "anthropic",
   "model": "claude-opus-4-8",
+  "priority": "performance",
   "maxTokens": 16000,
   "thinking": true,
   "effort": "high",
@@ -128,6 +130,38 @@ the savings where it can:
 (see `TODO.md`), which will keep input-token counts from compounding across
 many turns without any user action.
 
+## Model awareness & cost control
+
+tiny-code ships a small, curated catalog of coding models
+(`src/models/catalog.ts`) with each model's pricing, context window, and a
+relative coding-aptitude score. It uses this to turn raw token counts into real
+money and to pick a model that fits your cost/performance preference.
+
+- **Dollar cost, not just tokens.** Per-turn usage and the session total show an
+  estimated USD cost next to the token counts, priced from the active model's
+  rate — so the bill is visible as you work, not a surprise later.
+- **`/models`** lists the catalog (cheapest first) with pricing and scores,
+  marks the active model, and shows the session's running cost.
+- **Priority-driven selection.** When you don't pin a `model`, tiny-code picks
+  one for you based on `priority`:
+
+  | `priority`      | Picks                                                        |
+  | --------------- | ----------------------------------------------------------- |
+  | `performance`   | The most capable model (the default — current behavior).    |
+  | `cost`          | The cheapest still-capable model.                           |
+  | `balanced`      | The best capability-per-dollar among capable models.        |
+
+  ```json
+  { "priority": "balanced" }
+  ```
+
+  Or per-session with `TINY_CODE_PRIORITY=cost`. Pinning `model` (config, env,
+  or `--model`) always overrides the recommendation.
+
+The catalog is curated and offline (tiny-code has no live model-discovery yet —
+see `TODO.md`), so its prices carry an "as of" date; keep it current as vendors
+ship new models and change pricing.
+
 ## Self-improvement
 
 tiny-code can learn from how it's used. When a session ends (or when you run

diff --git a/TODO.md b/TODO.md
@@ -51,6 +51,15 @@ Save/restore `AgentLoop.getMessages()` to disk; `--resume` to continue a session
 Pair with the compaction feature above so resumed sessions don't carry a bloated
 history.
 
+## Live model catalog refresh
+The model catalog (`src/models/catalog.ts`) is curated and offline, so its
+pricing and model list drift until a human updates them. **Approach:** an opt-in
+refresh that pulls current models/pricing from the provider APIs (Anthropic's
+`GET /v1/models` for capabilities; a pricing source for rates) and Gemini's
+equivalent, caching to disk with the `CATALOG_AS_OF` date. Gate behind a flag so
+the default stays offline and deterministic. Pairs with the existing
+priority-based selection — fresher data, same `recommendModel` logic.
+
 ## ripgrep-backed grep
 The `grep` tool currently walks the tree in JS. **Approach:** detect `rg` on
 PATH and shell out for speed + .gitignore awareness, falling back to the JS

diff --git a/src/cli.ts b/src/cli.ts
@@ -19,8 +19,10 @@ Options:
   -h, --help          Show this help
 
 Environment:
-  ANTHROPIC_API_KEY   Required for the Anthropic provider
-  GEMINI_API_KEY      Required for the Gemini provider
+  ANTHROPIC_API_KEY    Required for the Anthropic provider
+  GEMINI_API_KEY       Required for the Gemini provider
+  TINY_CODE_PRIORITY   performance | cost | balanced — auto-picks a model when
+                       none is pinned (default: performance)
 `;
 
 function main(): void {

diff --git a/src/config/load.ts b/src/config/load.ts
@@ -2,9 +2,12 @@ import { readFileSync, existsSync } from 'node:fs';
 import { homedir } from 'node:os';
 import { join } from 'node:path';
 import { z } from 'zod';
+import type { Priority } from '../models/catalog.js';
+import { recommendModel } from '../models/catalog.js';
 
 export type Provider = 'anthropic' | 'gemini';
 export type Effort = 'low' | 'medium' | 'high' | 'xhigh' | 'max';
+export type { Priority } from '../models/catalog.js';
 
 /** Auto-approval rules that bypass the interactive permission prompt. */
 export interface AllowRules {
@@ -19,6 +22,8 @@ export interface AllowRules {
 export interface ResolvedConfig {
   provider: Provider;
   model: string;
+  /** Cost/performance bias used to auto-pick a model when none is pinned. */
+  priority: Priority;
   anthropicApiKey: string | undefined;
   geminiApiKey: string | undefined;
   maxTokens: number;
@@ -55,6 +60,7 @@ const FileConfigSchema = z
   .object({
     provider: z.enum(['anthropic', 'gemini']).optional(),
     model: z.string().optional(),
+    priority: z.enum(['performance', 'cost', 'balanced']).optional(),
     maxTokens: z.number().int().positive().optional(),
     thinking: z.boolean().optional(),
     effort: z.enum(['low', 'medium', 'high', 'xhigh', 'max']).optional(),
@@ -107,8 +113,17 @@ export function loadConfig(overrides: CliOverrides = {}, cwd: string = process.c
     file.provider ??
     (anthropicApiKey ? 'anthropic' : geminiApiKey ? 'gemini' : 'anthropic');
 
+  const priority: Priority =
+    (env.TINY_CODE_PRIORITY as Priority | undefined) ?? file.priority ?? 'performance';
+
+  // When the user pins a model, honor it. Otherwise let the catalog pick the
+  // best fit for the cost/performance priority, falling back to a static
+  // default if the catalog has no entry for the provider.
+  const pinnedModel = overrides.model ?? env.TINY_CODE_MODEL ?? file.model;
   const model =
-    overrides.model ?? env.TINY_CODE_MODEL ?? file.model ?? DEFAULT_MODELS[provider];
+    pinnedModel ??
+    recommendModel({ provider, priority })?.id ??
+    DEFAULT_MODELS[provider];
 
   const maxTokens = env.TINY_CODE_MAX_TOKENS
     ? Number(env.TINY_CODE_MAX_TOKENS)
@@ -124,6 +139,7 @@ export function loadConfig(overrides: CliOverrides = {}, cwd: string = process.c
   return {
     provider,
     model,
+    priority,
     anthropicApiKey,
     geminiApiKey,
     maxTokens,

diff --git a/src/index.ts b/src/index.ts
@@ -20,7 +20,18 @@ export { PermissionGate } from './permissions/gate.js';
 export type { PermissionPrompt, PermissionRequest, PermissionChoice } from './permissions/gate.js';
 
 export { loadConfig } from './config/load.js';
-export type { ResolvedConfig, CliOverrides, Provider, Effort, AllowRules } from './config/load.js';
+export type { ResolvedConfig, CliOverrides, Provider, Effort, Priority, AllowRules } from './config/load.js';
+
+export {
+  MODEL_CATALOG,
+  CATALOG_AS_OF,
+  getModelInfo,
+  estimateCostUsd,
+  formatUsd,
+  blendedCostPerMTok,
+  recommendModel,
+} from './models/catalog.js';
+export type { ModelInfo, RecommendOptions } from './models/catalog.js';
 export { loadProjectContext } from './config/context.js';
 
 export { loadCommands, renderCommand } from './commands/loader.js';

diff --git a/src/models/catalog.ts b/src/models/catalog.ts
@@ -0,0 +1,132 @@
+import type { Provider } from '../config/load.js';
+import type { Usage } from '../providers/types.js';
+
+/**
+ * How to weigh cost vs. capability when auto-selecting a model.
+ * - `performance`: most capable model (maximize quality, ignore price)
+ * - `cost`: cheapest capable model (maximize savings)
+ * - `balanced`: best capability-per-dollar among genuinely capable models
+ */
+export type Priority = 'performance' | 'cost' | 'balanced';
+
+/**
+ * Curated facts about a coding model. Pricing is USD per 1,000,000 tokens for
+ * the standard (non-cached, ≤200K-context) tier — the common case for an
+ * interactive coding session. `codingScore` is a curated 0–100 estimate of
+ * relative aptitude on coding/agentic tasks, used only to rank models against
+ * each other; it is not a vendor benchmark.
+ */
+export interface ModelInfo {
+  id: string;
+  provider: Provider;
+  label: string;
+  inputPricePerMTok: number;
+  outputPricePerMTok: number;
+  contextWindow: number;
+  codingScore: number;
+}
+
+/**
+ * The date this catalog's pricing and model list were last verified. Models and
+ * prices move; keep this current when updating entries. Anthropic figures come
+ * from the bundled claude-api reference; Gemini figures from Google's published
+ * API pricing.
+ */
+export const CATALOG_AS_OF = '2026-06-08';
+
+/**
+ * The known coding models, newest/most-capable first within each provider.
+ * Keep this list curated — tiny-code runs offline-first, so it can't discover
+ * models at runtime. Update it (and {@link CATALOG_AS_OF}) as vendors ship.
+ */
+export const MODEL_CATALOG: ModelInfo[] = [
+  // Anthropic — pricing per the claude-api model table.
+  { id: 'claude-opus-4-8', provider: 'anthropic', label: 'Claude Opus 4.8', inputPricePerMTok: 5, outputPricePerMTok: 25, contextWindow: 1_000_000, codingScore: 99 },
+  { id: 'claude-opus-4-7', provider: 'anthropic', label: 'Claude Opus 4.7', inputPricePerMTok: 5, outputPricePerMTok: 25, contextWindow: 1_000_000, codingScore: 96 },
+  { id: 'claude-opus-4-6', provider: 'anthropic', label: 'Claude Opus 4.6', inputPricePerMTok: 5, outputPricePerMTok: 25, contextWindow: 1_000_000, codingScore: 93 },
+  { id: 'claude-sonnet-4-6', provider: 'anthropic', label: 'Claude Sonnet 4.6', inputPricePerMTok: 3, outputPricePerMTok: 15, contextWindow: 1_000_000, codingScore: 88 },
+  { id: 'claude-haiku-4-5', provider: 'anthropic', label: 'Claude Haiku 4.5', inputPricePerMTok: 1, outputPricePerMTok: 5, contextWindow: 200_000, codingScore: 75 },
+
+  // Gemini — standard-tier pricing (prompts ≤200K tokens) from Google AI pricing.
+  { id: 'gemini-2.5-pro', provider: 'gemini', label: 'Gemini 2.5 Pro', inputPricePerMTok: 1.25, outputPricePerMTok: 10, contextWindow: 1_048_576, codingScore: 90 },
+  { id: 'gemini-2.5-flash', provider: 'gemini', label: 'Gemini 2.5 Flash', inputPricePerMTok: 0.3, outputPricePerMTok: 2.5, contextWindow: 1_048_576, codingScore: 72 },
+  { id: 'gemini-2.5-flash-lite', provider: 'gemini', label: 'Gemini 2.5 Flash-Lite', inputPricePerMTok: 0.1, outputPricePerMTok: 0.4, contextWindow: 1_048_576, codingScore: 55 },
+];
+
+/** Look up catalog facts for a model id, or `undefined` if it's not tracked. */
+export function getModelInfo(id: string): ModelInfo | undefined {
+  return MODEL_CATALOG.find((m) => m.id === id);
+}
+
+/** Estimate the USD cost of a token usage given a model's pricing. */
+export function estimateCostUsd(usage: Usage, info: ModelInfo): number {
+  return (
+    (usage.inputTokens / 1_000_000) * info.inputPricePerMTok +
+    (usage.outputTokens / 1_000_000) * info.outputPricePerMTok
+  );
+}
+
+/** Format a USD amount with precision that stays readable for tiny costs. */
+export function formatUsd(amount: number): string {
+  return `$${amount.toFixed(amount < 1 ? 4 : 2)}`;
+}
+
+/**
+ * Coding sessions are input-heavy (history is resent every turn), so blend
+ * pricing 80% input / 20% output to compare models on a single cost number.
+ */
+export function blendedCostPerMTok(info: ModelInfo): number {
+  return info.inputPricePerMTok * 0.8 + info.outputPricePerMTok * 0.2;
+}
+
+/**
+ * Minimum coding aptitude to consider, per priority. Keeps `balanced`/`cost`
+ * from collapsing onto the cheapest-but-weakest model — score-per-dollar always
+ * favors the cheapest, so a capability floor is what makes the tradeoff useful.
+ */
+const DEFAULT_MIN_SCORE: Record<Priority, number> = {
+  performance: 0,
+  balanced: 80,
+  cost: 60,
+};
+
+export interface RecommendOptions {
+  provider: Provider;
+  priority: Priority;
+  /** Reject models below this coding score. Defaults per-priority. */
+  minCodingScore?: number;
+  /** Reject models whose context window is smaller than this. */
+  minContextWindow?: number;
+}
+
+/**
+ * Pick the model that best fits a cost/performance priority. Returns the single
+ * best candidate, or `undefined` if the constraints exclude every model for the
+ * provider (callers should fall back to a static default).
+ */
+export function recommendModel(opts: RecommendOptions): ModelInfo | undefined {
+  const minScore = opts.minCodingScore ?? DEFAULT_MIN_SCORE[opts.priority];
+  const candidates = MODEL_CATALOG.filter(
+    (m) =>
+      m.provider === opts.provider &&
+      m.codingScore >= minScore &&
+      (opts.minContextWindow === undefined || m.contextWindow >= opts.minContextWindow),
+  );
+  if (candidates.length === 0) return undefined;
+
+  const score = (m: ModelInfo): number => {
+    switch (opts.priority) {
+      case 'performance':
+        // Highest aptitude; break ties toward the cheaper option.
+        return m.codingScore - blendedCostPerMTok(m) / 1000;
+      case 'cost':
+        // Cheapest; break ties toward the more capable option.
+        return -blendedCostPerMTok(m) + m.codingScore / 1000;
+      case 'balanced':
+        // Best capability per dollar.
+        return m.codingScore / blendedCostPerMTok(m);
+    }
+  };
+
+  return candidates.reduce((best, m) => (score(m) > score(best) ? m : best));
+}