Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,10 @@ GPU_HOST=
GPU_SHUTDOWN_TOKEN=
MAX_A2A_URL=
IOS_DEVICE_ID=

# Context sizing — compact history before it eats the model's rate limit.
# Defaults target Claude subscription safety (150K token budget before compact).
# For Gemini direct or 1M-context models, raise MAX_CONTEXT_WINDOW.
MAX_CONTEXT_WINDOW=200000
MAX_COMPACT_THRESHOLD=0.75
MAX_KEEP_RECENT=6
19 changes: 19 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,25 @@ Requirements:
- `claude` CLI installed and authenticated on the host
- `ANTHROPIC_BASE_URL` should point to your AgentWeave proxy if you want subagent LLM calls visible in AgentWeave

### Context sizing

Max automatically compacts old history once the running token estimate crosses a threshold. Defaults target a Claude subscription, where every input token counts against the 5-hour rate limit:

| Env var | Default | Meaning |
|---|---|---|
| `MAX_CONTEXT_WINDOW` | `200000` | Upper bound used for budget math |
| `MAX_COMPACT_THRESHOLD` | `0.75` | Fraction of the window before compaction kicks in (default = 150K tokens) |
| `MAX_KEEP_RECENT` | `6` | Messages always kept intact at the tail |

For Gemini direct or other cheap-long-context providers, raise the window:

```bash
MAX_CONTEXT_WINDOW=1000000
MAX_COMPACT_THRESHOLD=0.8
```

The effective values are logged once on the first `transformContext` call.

### Development

```bash
Expand Down
35 changes: 31 additions & 4 deletions src/context.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,37 @@ import type { AssistantMessage, Message } from "@mariozechner/pi-ai";
import { getModel, getEnvApiKey, streamSimple } from "@mariozechner/pi-ai";
import { log } from "./logger.js";

const CONTEXT_WINDOW = 1_000_000;
const COMPACT_THRESHOLD = 0.8; // compact at 80%
/**
* Context-sizing knobs.
*
* Defaults target a Claude subscription path, where every input token counts
* against the 5-hour rate limit — compact at ~150K before a long session can
* exhaust quota.
*
* Override via env for providers with cheaper long context (e.g. Gemini direct):
* MAX_CONTEXT_WINDOW=1000000 MAX_COMPACT_THRESHOLD=0.8 MAX_KEEP_RECENT=6
*/
function envNumber(name: string, fallback: number): number {
const raw = process.env[name];
if (!raw) return fallback;
const n = Number(raw);
return Number.isFinite(n) && n > 0 ? n : fallback;
}

const CONTEXT_WINDOW = envNumber("MAX_CONTEXT_WINDOW", 200_000);
const COMPACT_THRESHOLD = envNumber("MAX_COMPACT_THRESHOLD", 0.75);
const TOKEN_LIMIT = Math.floor(CONTEXT_WINDOW * COMPACT_THRESHOLD);
// Keep at least the last N messages untouched during compaction
const KEEP_RECENT = 6;
const KEEP_RECENT = Math.floor(envNumber("MAX_KEEP_RECENT", 6));

let loggedConfig = false;
function logConfigOnce(): void {
if (loggedConfig) return;
loggedConfig = true;
log(
"info",
`Context sizing: window=${CONTEXT_WINDOW} threshold=${TOKEN_LIMIT} (${Math.round(COMPACT_THRESHOLD * 100)}%) keepRecent=${KEEP_RECENT}`
);
}

/** Rough token estimate: ~4 chars per token for text, actual usage for assistant messages */
function estimateMessageTokens(msg: AgentMessage): number {
Expand Down Expand Up @@ -77,6 +103,7 @@ export function getContextStats(messages: AgentMessage[]): ContextStats {
* - Replace with a single compact user message containing the summary
*/
export async function transformContext(messages: AgentMessage[]): Promise<AgentMessage[]> {
logConfigOnce();
const totalTokens = messages.reduce((sum, m) => sum + estimateMessageTokens(m), 0);

if (totalTokens <= TOKEN_LIMIT) {
Expand Down
Loading