fix(wiki-compile): adaptive truncation for clusters > model context#6
Open
GuyMannDude wants to merge 1 commit into
Open
fix(wiki-compile): adaptive truncation for clusters > model context#6GuyMannDude wants to merge 1 commit into
GuyMannDude wants to merge 1 commit into
Conversation
Hot entities accumulated across deep history produce clusters that exceed the reasoning model's context window. Surfaced today on artforge backfilling 14 days into entities/guy (4.7M tokens), entities/igor (3.9M), entities/opie (4.8M), entities/rocky — Gemini 2.5 Flash caps at 1M. The per-cluster try/except caught the 400s and kept the rest of the run going, but those four pages silently stopped updating. See issue #5 for the full repro and design tradeoffs. Fix mirrors the existing adaptive-truncation pattern in agentb/vec.py:embed_with_adaptive_truncation — halve and retry until either the call succeeds or we hit the min_memories floor and re-raise for the per-cluster handler. New behavior: - `compile_topic_adaptive(section, slug, memories, existing, min_memories=1)`: sort newest-first, build prompt, call LLM. On context-overflow 400 (or the rare `KeyError('choices')` shape some providers return when the prompt is so oversized the response JSON is malformed — observed on artforge's entities/rocky cluster), halve to `len(current) // 2` and retry. Stop when call succeeds or cluster reaches min_memories and still fails — in the latter case re-raise so the existing per-cluster try/except logs the topic as failed (same behavior as today, just for the genuinely unfit-at-any-size case). - Caller in `main()` switches from `call_llm(prompt)` to `compile_topic_adaptive(...)`, receives back the memories actually used, passes both used and total to `render_page`. - `render_page(section, slug, body, memories, total_memories=None)`: when `total_memories > len(memories)`, surfaces the truncation in three places — visible header line ("Source memories: N of M (⚠️ K older dropped — see footer)"), front matter (`cluster-truncated`, `cluster-total`, `cluster-dropped` for machine readers), and an expanded footer note pointing readers at `mnemo_recall` for the dropped entries (vapor truth — page operates on partial data, say so). - Non-overflow errors (500, 401, auth, network) re-raise immediately without halving — only context-length 400s trigger the retry loop. - Detection helper `_is_context_overflow_error(err)` factored out so future provider-specific shapes are one place to extend. Verified locally with four smoke scenarios: - Oversized cluster (100 memories, fail-above-25): halves 100 → 50 → 25, succeeds; rendered page surfaces 25 of 100 with the dropped count in header + footer + front matter. - Small cluster (3 memories, never fails): one call, no truncation noise in the output (footer matches prior format byte-for-byte). - Non-overflow 500 / 401: re-raises immediately without retry. - Persistent overflow at floor (every call returns 400): halves 10 → 5 → 2 → 1, then re-raises rather than looping forever. - `KeyError('choices')` treated as overflow signal so artforge's entities/rocky failure mode (where the response JSON was malformed rather than returning a clean 400) recovers correctly. Token cost note: a successful halve-twice run uses ~3x the LLM calls of the original single attempt, but a failed compile would have produced no page update at all. Net: pages on hot entities stay fresh at the cost of one extra retry per oversize cluster per night. Doesn't address the legacy-paths issue separately tracked in our internal brain — that's about WHAT the wiki harvests, this fixes WHAT IT CAN COMPILE once harvested. Closes #5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--days 14backfill.agentb/vec.py:embed_with_adaptive_truncation. Sort newest-first, try full cluster, halve on 400-context-length, retry until success or floor.mnemo_recallfor the dropped entries.What this PR does NOT fix
~/.agentb/agents/<agent>/memory/*.json+~/.mnemo-v2/mnemo.sqlite3) rather than the v3 server's storage. That's a separate concern (what gets harvested) and not what wiki-compile fails on hot-entity clusters that exceed reasoning model context (no chunking/truncation) #5 is about (how to handle what's harvested when it's too big for one LLM call).Why halve-and-retry (and not chunk-merge)
Issue #5 lists four fix options ranked by effort. This PR picks option 1 (adaptive truncation) because:
agentb/vec.py), so reviewers don't need to evaluate a new strategy.mnemo_recalland the page footer says so explicitly.Chunk-merge (option 3) is the better long-term answer for the highest-fidelity wiki pages but a much bigger surface change. This PR is the "stop the silent breakage" fix; option 3 can land as a follow-up if the truncation footer turns out to drop too much.
Behavior changes
**Source memories:** N of M (⚠️ K older entries dropped — see footer). Front matter gainscluster-truncated: true,cluster-total: M,cluster-dropped: K. Footer explains why and points atmnemo_recallmainlogs as failed topic; same as today)render_pagegains an optionaltotal_memories: int \| Noneparameter (default None preserves prior behavior byte-for-byte).Test plan
All four smoke scenarios verified locally:
100 → 50 → 25, succeeds. Rendered page surfaces "25 of 100" in header + footer + front mattercluster-truncated/total/droppedkeys.10 → 5 → 2 → 1, then re-raises rather than looping forever.KeyError('choices')shape (the artforge entities/rocky failure mode where the response JSON was malformed rather than a clean 400): correctly treated as an overflow signal, halves and recovers.python -m py_compile mnemo-wiki-compile.pyclean.Production verification — needs the next nightly cron or a manual
--days 14run on a deployment with hot entities. Today's backfill on artforge had 4 such failures and would be the natural validation; happy to run it from my end once this lands and report back, or hand off.Closes #5.
🤖 Generated with Claude Code