(V)(;,,;)(V)
The Token-Pinching Crustacean for OpenClaw
"I like tokens the way I like me money... SAVED."
Mr. Krab hooks into OpenClaw's prompt lifecycle and pinches every wasted token from your LLM calls. He's got two claws — one for conversation history, one for tool definitions — and he uses them on every single call.
He's query-aware: ask about GitHub issues and he snips away the Spotify docs, the deployment chatter, and that Slack conversation from 3 hours ago. Only what matters gets through. Everything else? snip snip.
Powered by ScaleDown's compression API. Falls back gracefully on failure — Mr. Krab is cheap, not reckless.
SCALEDOWN_API_KEY="your-key" npx tsx demo.ts ___ ___
/ /\ / /\
/ /::| / /::|
/ /:|:| / /:|:|
/ /:/|:|__ / /:/|:|__
/__/:/ |:| /\ /__/:/ |:| /\
\__\/ |:|/:/ \__\/ |:|/:/
| |:/:/ | |:/:/
|__|::/ |__|::/
\__\/ \__\/
MR. KRAB x OpenClaw
"I like tokens the way I like me money...
SAVED."
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
(V)(;,,;)(V) Mr. Krab's Token-Pinching Demo for OpenClaw
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
The Prompt (before Mr. Krab gets his claws on it):
Conversation: 15 messages (~1,767 tokens)
Tool defs: 17 tools & skills (~1,829 tokens)
TOTAL: ~3,596 tokens per LLM call
"3,596 tokens?! That's like flushing doubloons down the drain!
Let me get me claws on this..."
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
LEFT CLAW — Session History Compression
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Before: 1,826 tokens (Vercel, auth bugs, Sentry, Slack, Spotify...)
After: 409 tokens (just the GitHub-relevant bits)
Saved: 1,417 tokens (77.6%)
█████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
Agagagaga! Now THAT'S what I call a bargain!
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
RIGHT CLAW — Tool & Skill Definition Compression
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Before: 1,743 tokens (browser, spotify, slack, calendar, email...)
After: 318 tokens (github_list_issues + schemas intact)
Saved: 1,425 tokens (81.8%)
███████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
*eyes pop out* THAT'S BEAUTIFUL! More savings than me secret formula!
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
(V)(;,,;)(V) MR. KRAB'S FINAL TALLY
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total before: 3,569 tokens
Total after: 727 tokens
Total saved: 2,842 tokens (79.6%)
████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
Money saved at GPT-4o rates ($2.50/1M input tokens):
Per call: ~$0.0071
Per 1,000 calls: ~$7.11
Per 100k calls: ~$710.50
"80% savings! That's even better than me employee discount
at the Krusty Krab! Agagagagaga!"
Mr. Krab watched a 15-message conversation about deploying Next.js, fixing auth bugs, setting up CI/CD, configuring Sentry, and playing Spotify — then the user asked about GitHub issues. His left claw kept the GitHub context and snipped away everything else. His right claw kept
github_list_issuesschemas intact and compressed away 16 irrelevant tool definitions. 80% savings. Every call.
Every time OpenClaw sends a message to an LLM, it packs in:
- Full conversation history — every message across WhatsApp, Slack, Discord, Telegram
- 52+ tool/skill definitions — SOUL.md, TOOLS.md, SKILL.md, AGENTS.md, all fully expanded
- System prompts — personality, rules, knowledge base
On a busy session, that's 8,000-15,000+ tokens per call. Most of it irrelevant to what the user just asked.
"That's like ordering the entire menu when you just want a Krabby Patty!"
Before Mr. Krab: ~12,000 input tokens per call
After Mr. Krab: ~3,500 input tokens per call
─────────────────────────────
~70% savings, every single call
Mr. Krab's compression is query-aware. He knows what the user is asking right now and only keeps what's relevant:
| User asks about... | Mr. Krab keeps | Mr. Krab snips away |
|---|---|---|
| "Deploy to production" | Deployment discussions, CI/CD tool docs | Spotify skill, Slack commands, unrelated chats |
| "Play some music" | Spotify skill schema, music preferences | GitHub Issues docs, deployment history |
| "That bug from yesterday" | Yesterday's debugging messages | Today's small talk, unrelated tool definitions |
1. Get your API key from ScaleDown
2. Install & configure
export SCALEDOWN_API_KEY="your-api-key"Add to your ~/.openclaw/openclaw.json:
{
"extensions": ["@scaledown/openclaw-extension"]
}3. That's it. Mr. Krab gets to work on every LLM call. Check your logs:
[mr.krab] Claws ready — session: true, tools: true, rate: auto
[mr.krab:left-claw] 3200 -> 1120 tokens (65.0% saved)
[mr.krab:right-claw] 8500 -> 2890 tokens (66.0% saved)
User sends message -> OpenClaw Gateway
|
|-- before_prompt_build hook fires
| |
| |-- LEFT CLAW ---------> ScaleDown API -----> compressed history
| | (conversation) (query-relevant only)
| |
| |-- RIGHT CLAW --------> ScaleDown API -----> compressed tools
| (52+ tool defs) (relevant tools only)
| (preserve_keywords) (names & schemas intact)
|
| Both claws work in parallel -- minimal latency
|
|-- Compressed prompt sent to LLM (40-80% fewer tokens)
|
'-- llm_output hook -> Mr. Krab logs the savings
Left Claw (Session History) pinches your conversation transcript per query. A 30-message cross-platform conversation gets snipped down to just the messages that matter for the current question.
Right Claw (Tool/Skill Definitions) pinches the system prompt containing all tool definitions. OpenClaw loads dozens of skills — browser, GitHub, Spotify, Slack, memory, calendar, and more. Mr. Krab compresses the documentation while keeping tool names, parameter schemas, and critical syntax intact via preserve_keywords. He pinches tokens, not tool names.
All optional. Mr. Krab works great out of the box.
{
"extensions": {
"@scaledown/openclaw-extension": {
"apiKey": "sk-...",
"rate": "auto",
"targetModel": "gpt-4o",
"enableSessionCompression": true,
"enableToolCompression": true,
"minTokenThreshold": 500,
"timeout": 15000
}
}
}| Option | Default | Description |
|---|---|---|
apiKey |
$SCALEDOWN_API_KEY |
Your ScaleDown API key |
rate |
"auto" |
How aggressively Mr. Krab pinches. "auto" recommended. 0.3 = aggressive, 0.7 = gentle. |
targetModel |
"gpt-4o" |
Target LLM for tokenizer-aware compression |
enableSessionCompression |
true |
Left claw on/off |
enableToolCompression |
true |
Right claw on/off |
minTokenThreshold |
500 |
Mr. Krab doesn't bother with small fry below this count |
timeout |
15000 |
API timeout in ms |
Mr. Krab is cheap, not reckless:
- Graceful fallback — If the ScaleDown API is down, slow, or returns an error, Mr. Krab silently falls back to uncompressed context. Your assistant keeps working. He'd rather miss a savings opportunity than break the LLM call.
- Threshold gating — Small contexts below
minTokenThresholdaren't worth pinching (even Mr. Krab knows some things cost more to save than they're worth). - Timeout protection — Configurable timeout (default 15s) with AbortController prevents hanging.
- No data storage — ScaleDown compresses in-flight. Your conversation data is not stored. "I'm cheap, not shady."
npm install # Install dependencies
npm run typecheck # Type check
npm run build # Build to dist/
npm run dev # Watch mode
npm test # Run all 53 tests
npm run demo # Watch Mr. Krab in actionopenclaw-scaledown-extension/
├── demo.ts # Mr. Krab's live demo
├── openclaw.plugin.json # Plugin manifest & config schema
├── package.json
├── tsconfig.json
├── src/
│ ├── index.ts # Plugin entry — hook registration
│ ├── scaledown-client.ts # ScaleDown REST API client
│ ├── session-compressor.ts # Left claw: conversation compression
│ ├── tool-compressor.ts # Right claw: tool definition compression
│ ├── types.ts # TypeScript types & defaults
│ └── utils.ts # Token estimation, serialization, chunking
└── tests/
├── utils.test.ts
├── scaledown-client.test.ts
├── session-compressor.test.ts
├── tool-compressor.test.ts
└── plugin-integration.test.ts
Unlike static prompt trimming or naive truncation, Mr. Krab uses query-aware semantic compression powered by ScaleDown. He understands what the user is asking and preserves the context that matters — producing better LLM responses with fewer tokens.
- Not truncation — doesn't chop off old messages blindly
- Not summarization — doesn't lose critical details or tool syntax
- Query-aware — adapts compression to every single message
- Two claws — handles both conversation history AND tool definitions
- Never breaks — graceful fallback on any failure
"Are you gonna pinch tokens, or are you gonna waste 'em like Squidward wastes his talent?"
MIT
(V)(;,,;)(V)