Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,20 @@ This file is loaded into the agent's system prompt when it runs in this repo.
- Add unit tests for new tools, providers, and loop behavior (Vitest).
- Run `npm run lint && npm run typecheck && npm test` before considering a change done.

## Token minimalism
Keeping token counts low is a core design concern, not an afterthought. Savings
should be automatic — the user should not need custom configuration to avoid
runaway costs.

- Keep the system prompt short. Tool descriptions are generated from Zod schemas;
don't add redundant prose.
- Tool output must be bounded. Always cap result sets (grep matches, glob hits,
file lines). Prefer targeted reads over full-file slurps.
- Surface usage automatically. Token counts appear after every turn and as a
session total on exit; no opt-in required.
- Prefer features that reduce tokens structurally (output caps, compaction) over
features that merely expose knobs for users to tune manually.

## Boundaries
- No business logic. This is a general-purpose tool.
- Don't add a second state paradigm or heavy dependencies without a clear reason.
Expand Down
39 changes: 35 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
# tiny-code

A small, extensible CLI coding agent. Interactive terminal REPL, interchangeable
**Anthropic** and **Gemini** models, and just the core features you actually use:
read/write/edit files, run shell commands, search code, and a custom
commands/skills system. No business logic baked in.
A small, extensible CLI coding agent built around one constraint: **keep token
usage low**. As coding-agent costs climb, tiny-code automates the savings so
you don't have to. Interactive terminal REPL, interchangeable **Anthropic** and
**Gemini** models, and just the core features you actually use: read/write/edit
files, run shell commands, search code, and a custom commands/skills system.
No business logic baked in.

> Status: early (v0.x). Published as `@therr/tiny-code`; the binary is
> `tiny-code`. Names may change before the first npm publish.
Expand Down Expand Up @@ -96,6 +98,35 @@ CLI flags.
`allow` pre-approves mutating actions so they skip the confirmation prompt:
`bash` matches command prefixes, `write` matches path globs for write/edit.

## Token efficiency

Minimizing token usage is a first-class goal — coding-agent bills grow fast,
and you shouldn't need a complex setup to control them. tiny-code automates
the savings where it can:

- **Usage visible by default.** Every assistant turn prints `↑ in ↓ out tokens`
with no configuration. On exit you get a session total.
- **Bounded tool output.** grep caps at 200 matches, glob at 500 files, and
`read_file` supports `offset`/`limit` to pull only the lines you need —
preventing runaway context growth automatically.
- **Minimal system prompt.** The built-in persona is kept short. Tool schemas
are generated from Zod (no duplicate prose). Project context is opt-in.
- **Concise agent instructions.** The agent is explicitly told to avoid
restating the task or narrating completed steps.
- **Effort control.** For Anthropic models, `effort` tunes the adaptive
thinking budget. Drop it from the default `"high"` to `"medium"` or `"low"`
for simpler, cheaper tasks:

```json
{ "effort": "medium" }
```

Or set it per-session with `TINY_CODE_EFFORT=medium`.

**Coming:** automatic conversation compaction once histories grow long
(see `TODO.md`), which will keep input-token counts from compounding across
many turns without any user action.

## Development

```bash
Expand Down
18 changes: 16 additions & 2 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,20 @@
Deferred features, roughly in priority order. Each entry notes the rationale and
a rough approach so it can be picked up later.

Token efficiency is a first-class goal. Features that reduce token usage
automatically (without user configuration) are prioritized over features that
add capability at higher cost.

## Conversation compaction
Input tokens compound quickly in long sessions because the full message history
is resent every turn. Compaction trims the history automatically once it grows
past a threshold, keeping costs from ballooning without any user action.
**Approach:** track cumulative `inputTokens` from `AgentLoop.getUsage()`; when
it crosses a configurable threshold (e.g. 50k), summarize earlier messages into
a single condensed block. For Anthropic use the compaction beta; for Gemini
summarize via a lightweight call to a cheap model. Pair with conversation
persistence so compacted sessions can be resumed.

## Sub-agents
Spawn isolated agent runs for parallel exploration/research (like a lightweight
Explore/Plan agent). **Approach:** a `spawn_agent` tool whose `execute` constructs
Expand Down Expand Up @@ -34,8 +48,8 @@ exits; permission gate falls back to allowlist-only (no prompts).

## Conversation persistence / resume
Save/restore `AgentLoop.getMessages()` to disk; `--resume` to continue a session.
Pair with token-budget-aware compaction once histories get long (Anthropic
compaction beta; manual summarization for Gemini).
Pair with the compaction feature above so resumed sessions don't carry a bloated
history.

## ripgrep-backed grep
The `grep` tool currently walks the tree in JS. **Approach:** detect `rg` on
Expand Down
10 changes: 10 additions & 0 deletions src/agent/loop.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ import type { PermissionGate } from '../permissions/gate.js';
import type { ToolResult } from '../tools/types.js';
import type { Message, ToolResultBlock, ToolUseBlock } from './types.js';

export type { Usage };

/** Sink for everything the loop wants to surface. The REPL provides the real one. */
export interface AgentUI {
onText(delta: string): void;
Expand Down Expand Up @@ -39,6 +41,7 @@ export class AgentLoop {
private readonly cwd: string;
private readonly maxIterations: number;
private readonly messages: Message[] = [];
private sessionUsage: Usage = { inputTokens: 0, outputTokens: 0 };

constructor(opts: AgentLoopOptions) {
this.provider = opts.provider;
Expand All @@ -55,6 +58,11 @@ export class AgentLoop {
return this.messages;
}

/** Cumulative token usage across all turns in this session. */
getUsage(): Usage {
return { ...this.sessionUsage };
}

/** Run one user turn to completion (through any number of tool round-trips). */
async run(userInput: string): Promise<void> {
this.messages.push({ role: 'user', content: [{ type: 'text', text: userInput }] });
Expand All @@ -75,6 +83,8 @@ export class AgentLoop {
} else if (event.type === 'tool_call') {
toolCalls.push({ type: 'tool_use', id: event.id, name: event.name, input: event.input });
} else {
this.sessionUsage.inputTokens += event.usage.inputTokens;
this.sessionUsage.outputTokens += event.usage.outputTokens;
this.ui.onUsage(event.usage);
}
}
Expand Down
4 changes: 3 additions & 1 deletion src/agent/systemPrompt.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,9 @@ Guidelines:
- Read files before editing them. Prefer small, targeted edits.
- When you run commands, explain briefly what you are doing only if it is non-obvious.
- Match the conventions of the surrounding code.
- When the task is complete, give a short summary of what changed. Do not narrate routine steps.`;
- When the task is complete, give a short summary of what changed. Do not narrate routine steps.
- Be concise. Avoid restating the task or narrating completed steps. Every output token has a cost.
- Prefer targeted reads (offset/limit) and filtered searches (glob patterns) over reading entire files or scanning broadly.`;

/** Compose the system prompt from persona, environment, tools, and project context. */
export function buildSystemPrompt(params: SystemPromptParams): string {
Expand Down
9 changes: 8 additions & 1 deletion src/repl.ts
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,14 @@ export async function startRepl(overrides: CliOverrides): Promise<void> {
};

rl.on('close', () => {
console.log(pc.dim('\nBye.'));
const usage = agent.getUsage();
if (usage.inputTokens > 0 || usage.outputTokens > 0) {
const fmtN = (n: number) => n.toLocaleString('en-US');
console.log(
pc.dim(`\nSession: ↑ ${fmtN(usage.inputTokens)} ↓ ${fmtN(usage.outputTokens)} tokens total`),
);
}
console.log(pc.dim('Bye.'));
process.exit(0);
});

Expand Down
8 changes: 6 additions & 2 deletions src/ui/render.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ function preview(name: string, input: unknown): string {
return JSON.stringify(obj);
}

function fmtN(n: number): string {
return n.toLocaleString('en-US');
}

function truncate(s: string, n: number): string {
const oneLine = s.replace(/\s*\n\s*/g, ' ').trim();
return oneLine.length > n ? `${oneLine.slice(0, n)}…` : oneLine;
Expand Down Expand Up @@ -46,8 +50,8 @@ export function createTerminalUI(): AgentUI {
ensureNewline();
write(pc.yellow(` ⊘ ${name} denied\n`));
},
onUsage() {
// Token usage is available here; kept silent to reduce noise in the MVP.
onUsage(usage) {
write(pc.dim(` ↑ ${fmtN(usage.inputTokens)} ↓ ${fmtN(usage.outputTokens)} tokens\n`));
},
onAssistantEnd() {
ensureNewline();
Expand Down
25 changes: 25 additions & 0 deletions tests/agent/loop.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,31 @@ describe('AgentLoop', () => {
}
});

it('accumulates token usage across a single turn', async () => {
const provider = new ScriptedProvider([
[
{ type: 'text', delta: 'hi' },
{ type: 'done', usage: { inputTokens: 10, outputTokens: 5 }, stopReason: 'end_turn' },
],
]);
const { ui } = recordingUI();
const loop = makeLoop(provider, ui, gateWith('yes'));
await loop.run('hello');
expect(loop.getUsage()).toEqual({ inputTokens: 10, outputTokens: 5 });
});

it('accumulates token usage across multiple run() calls', async () => {
const provider = new ScriptedProvider([
[{ type: 'done', usage: { inputTokens: 10, outputTokens: 5 }, stopReason: 'end_turn' }],
[{ type: 'done', usage: { inputTokens: 20, outputTokens: 8 }, stopReason: 'end_turn' }],
]);
const { ui } = recordingUI();
const loop = makeLoop(provider, ui, gateWith('yes'));
await loop.run('first');
await loop.run('second');
expect(loop.getUsage()).toEqual({ inputTokens: 30, outputTokens: 13 });
});

it('stops at the iteration guard when tools never stop', async () => {
const looping: ProviderEvent[][] = [];
for (let i = 0; i < 10; i += 1) {
Expand Down
9 changes: 9 additions & 0 deletions tests/ui/render.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,15 @@ describe('createTerminalUI', () => {
expect(out).toContain('max iterations');
});

it('displays token usage per turn', () => {
const out = capture(() => {
const ui = createTerminalUI();
ui.onUsage({ inputTokens: 1234, outputTokens: 567 });
});
expect(out).toContain('1,234');
expect(out).toContain('567');
});

it('previews path- and pattern-based tools', () => {
const out = capture(() => {
const ui = createTerminalUI();
Expand Down
Loading