Stateless dialogue compression based on natural forgetting curve.
LLM conversations grow linearly. MosaicCompress keeps them bounded — automatically, invisibly, and without the user ever knowing what a "Session" is.
Your message array (R rounds, oldest → newest):
Round 1 ────→ Round (R-50) │ Heavy zone → ALL → 2 msgs
Round (R-49) → Round (R-30) │ Light zone → distill each, count unchanged
Round (R-29) ──→ Round R │ Raw zone → keep as-is
Steady state: always 82 messages, whether at round 60 or round 15,000. The compression ratio approaches 100%.
npm install mosaic-compressimport { mosaicCompress, type MosaicConfig } from 'mosaic-compress';
const config: MosaicConfig = {
lightStart: 30, // keep 30 most recent rounds raw
lightWindow: 10, // compress every 10 rounds
heavyStart: 50, // rounds before this get heavy compression
heavyWindow: 10, // same cadence as light
callLLM: async (systemPrompt, userInput) => {
// Wire to OpenAI, Anthropic, or any LLM provider
const res = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{ role: 'system', content: systemPrompt },
{ role: 'user', content: userInput },
],
});
return res.choices[0].message.content ?? '';
},
};
// Call every turn — zero cost below threshold, ~1-2s delay at compression milestones
const compressed = await mosaicCompress(messages, config);- Stateless & Idempotent — same input always yields same output
- Zero-cost below threshold — returns immediately if no compression is due
- Anti-jitter — compression only at configurable window boundaries
- LLM-agnostic — bring your own
callLLMfunction (OpenAI, Anthropic, local models…) - Tool-call safe — tool messages don't break round counting
- Graceful degradation — LLM failures don't block the conversation
| Param | Type | Description |
|---|---|---|
messages |
Message[] |
Full message array. System prompt at [0] is preserved as-is. |
config |
MosaicConfig |
Compression config (see below). |
| Returns | Promise<Message[]> |
Compressed message array. |
| Field | Type | Default | Description |
|---|---|---|---|
lightStart |
number |
30 |
Most recent N rounds kept raw |
lightWindow |
number |
10 |
Anti-jitter: compress every N rounds |
heavyStart |
number |
50 |
Rounds beyond this → Heavy zone |
heavyWindow |
number |
10 |
Anti-jitter for heavy compression |
callLLM |
(sys: string, user: string) => Promise<string> |
required | Your LLM call function |
interface Message {
role: 'system' | 'user' | 'assistant' | 'tool';
content: string;
tool_call_id?: string;
tool_calls?: { id: string; type: 'function'; function: { name: string; arguments: string } }[];
}| Rounds | Uncompressed | Compressed | Reduction |
|---|---|---|---|
| 100 | 100K tokens | 33.7K | 66% |
| 500 | 500K tokens | 33.7K | 93% |
| 5,000 | 5M tokens | 33.7K | 99.3% |
| 15,000 | 15M tokens | 33.7K | 99.8% |
From round 60 onward, the compressed size is completely constant.
Read the full design document (English) or 中文设计文档.
# Run tests (zero LLM cost — uses mock responses)
npm test
# Or directly:
npx tsx tests/index.test.tsMIT — TuringCorp | iAsk@turingcorp.net