Problem
Two hygiene gaps:
1. find_contradictions() is a tool nobody calls
The tool exists but agents rarely invoke it proactively. Over weeks, memories can drift and contradict (e.g. 'use library X' vs 'we migrated off library X to Y'). The contradiction is only caught the next time the user catches the agent applying the stale rule.
2. recall ranking is opaque
recall() returns memories with score + decay + trust, but the agent has no insight into WHY a memory ranked above another. When two contradictory memories both have score ~0.1, the agent can pick the wrong one and not know.
Proposal
Inline contradiction detection
recall() runs a lightweight contradiction check on its result set (top-K memories) and surfaces conflicts inline:
[working] (project) score=0.11 decay=1.00 trust=0.50
Use library X for date formatting.
id=aaa-...
[working] (project) score=0.10 decay=1.00 trust=0.50
We migrated off library X to Y on 2026-04-15.
id=bbb-...
⚠️ Contradiction detected between aaa-... and bbb-...
Prefer the newer memory (bbb-...) or call resolve_contradictions(aaa, bbb) to pick one.
Optional debug mode for ranking
recall(query, debug=True) includes per-memory:
score: 0.115
breakdown:
semantic_similarity: 0.81 (cos)
keyword_overlap: 0.42 (BM25)
hrr_match: 0.55
decay_factor: 1.00
trust_factor: 0.50
final = semantic*w1 + keyword*w2 + hrr*w3 - age_penalty
The agent (and the user inspecting traces) can see why a result ranked higher and trust the ordering.
Why this matters
Memories accumulate. Without proactive hygiene, the same human-rule will appear in 3 versions over 6 months and the agent will keep guessing.
Acceptance
recall() flags contradicting pairs in its result set with their ids.
recall(debug=True) returns the ranking breakdown for each result.
Problem
Two hygiene gaps:
1.
find_contradictions()is a tool nobody callsThe tool exists but agents rarely invoke it proactively. Over weeks, memories can drift and contradict (e.g. 'use library X' vs 'we migrated off library X to Y'). The contradiction is only caught the next time the user catches the agent applying the stale rule.
2. recall ranking is opaque
recall()returns memories with score + decay + trust, but the agent has no insight into WHY a memory ranked above another. When two contradictory memories both have score ~0.1, the agent can pick the wrong one and not know.Proposal
Inline contradiction detection
recall()runs a lightweight contradiction check on its result set (top-K memories) and surfaces conflicts inline:Optional debug mode for ranking
recall(query, debug=True)includes per-memory:The agent (and the user inspecting traces) can see why a result ranked higher and trust the ordering.
Why this matters
Memories accumulate. Without proactive hygiene, the same human-rule will appear in 3 versions over 6 months and the agent will keep guessing.
Acceptance
recall()flags contradicting pairs in its result set with their ids.recall(debug=True)returns the ranking breakdown for each result.