fix(auto-recall): Tier 1 memory counter fix (ms-based suppression + lazy-heal)#712
fix(auto-recall): Tier 1 memory counter fix (ms-based suppression + lazy-heal)#712raw34 wants to merge 4 commits intoCortexReach:masterfrom
Conversation
rwmjhb
left a comment
There was a problem hiding this comment.
Review action: COMMENT
Thanks for the PR. GitHub currently reports this branch as not mergeable (mergeable=CONFLICTING, merge_state_status=DIRTY), so I am deferring deep review until the diff can be reviewed against the current base.
Please rebase onto the latest base branch, resolve the merge conflicts, and push the updated branch. Once it is cleanly mergeable again, I will re-run the full review on the updated diff.
…azy-heal) Rewrite the Tier 1 auto-recall lifecycle to fix counter bugs that blocked decay-engine, access-tracker reinforcement, and future tier-manager promotion. See Issue CortexReach#633 / RFC CortexReach#445. Schema: - Add optional `suppressed_until_ms?: number` field to SmartMemoryMetadata. parseSmartMetadata preserves the `undefined` sentinel for "never touched by Tier 1" vs "touched, no active suppression". Includes guard comment + NaN test to prevent accidental clampCount refactors. Config: - Add `autoRecallBadRecallDecayMs` (default 24h, Option C decay window) and `autoRecallSuppressionDurationMs` (default 30min). Both intentionally omit a `maximum` bound (documented via $comment). Auto-recall inline patch: - Increment access_count and last_accessed_at on every injection — the core fix unblocking downstream lifecycle components. - Lazy-heal legacy pollution: memories with no suppressed_until_ms AND non-zero bad_recall_count / suppressed_until_turn get reset on first Tier 1 touch. - Option C decay: if gap since last_injected_at exceeds badRecallDecayMs, reset bad_recall_count to 0 before stale-injection increment. Negative gaps from clock skew fall through as "no decay" (conservative). - Write ms-based suppressed_until_ms (replacing turn-number suppression); gateway restart no longer creates false "just expired" windows. - Always zero legacy suppressed_until_turn to prevent stale-number leaks. Governance filter: - Replace `suppressed_until_turn` vs `currentTurn` check with absolute-time comparison against `suppressed_until_ms`. Suppression anchor is now wall-clock, not session-local. Out of scope (intentionally unchanged): - staleInjected judgment (Proposal A / PR CortexReach#597 territory) - Manual memory_recall path - Tier-manager activation Tests: - New test/tier1-counters.test.mjs asserts access_count increments 0->1 via the shared parseSmartMetadata plumbing. - Registered in npm test chain, ci-test-manifest, and verify-ci-test-manifest baseline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0212126 to
0841c4a
Compare
rwmjhb
left a comment
There was a problem hiding this comment.
PR #712 Review: fix(auto-recall): Tier 1 memory counter fix (ms-based suppression + lazy-heal)
Verdict: REQUEST-CHANGES | 6 rounds completed | Value: 67% | Size: LARGE | Author: raw34
Value Assessment
Problem: Auto-recall memories can accumulate broken recall/suppression counters and fail to update access counters, which undermines recall availability and downstream reinforcement/decay behavior. This PR attempts to move suppression from session turn numbers to absolute millisecond timestamps, lazily heal legacy counter pollution, and increment access metadata on injection.
| Dimension | Assessment |
|---|---|
| Value Score | 67% |
| Value Verdict | review |
| Issue Linked | true |
| Project Aligned | true |
| Duplicate | false |
| AI Slop Score | 1/6 |
| User Impact | high |
| Urgency | medium |
Scope Drift: 1 flag(s)
- test/smart-memory-lifecycle.mjs and test/tier1-counters.test.mjs duplicate the production patch logic in local helpers rather than exercising the actual index.ts auto-recall injection path
AI Slop Signals:
- The new tests mirror production logic in local computeTier1Patch/isSuppressed helpers and include a comment saying index.ts wiring is verified by visual inspection, so part of the claimed fix is not directly tested.
Open Questions:
- The provided linked issue data has no labels, assignment, or maintainer comments, so maintainer acknowledgement of #633 cannot be confirmed from this context.
- PR #597 overlaps with recall governance and last_confirmed_use_at behavior; maintainers should confirm whether this Tier 1 ms-suppression approach should land independently or after Proposal A.
- The full suite currently fails in test/smart-extractor-branches.mjs; that may be unrelated, but it must be resolved or classified before merge.
- Maintainers should decide whether autoRecallBadRecallDecayMs and autoRecallSuppressionDurationMs need to be public config or could remain internal constants until behavior stabilizes.
Summary
Auto-recall memories can accumulate broken recall/suppression counters and fail to update access counters, which undermines recall availability and downstream reinforcement/decay behavior. This PR attempts to move suppression from session turn numbers to absolute millisecond timestamps, lazily heal legacy counter pollution, and increment access metadata on injection.
Evaluation Signals
| Signal | Value |
|---|---|
| Blockers | 0 |
| Warnings | 0 |
| PR Size | LARGE |
| Verdict Floor | request-changes |
| Risk Level | high |
| Value Model | codex |
| Primary Model | codex |
| Adversarial Model | claude |
Must Fix
- EF1: Full test suite is failing
Nice to Have
- F2: Manual recall no longer clears active auto-recall suppression
- F3: Debug output reports the retired suppression field
- F4: New counter tests duplicate production logic instead of exercising it
Recommended Action
Good direction — problem is worth solving. Author should address must-fix findings, then this is ready to merge.
Reviewed at 2026-05-05T03:31:29Z | 6 rounds | Value: codex | Primary: codex | Adversarial: claude
Per rwmjhb's CHANGES_REQUESTED review (2026-05-05): F2: manual `memory_recall` now clears `suppressed_until_ms` alongside `suppressed_until_turn` in src/tools.ts:594-614. Without this, after the governance check moved to ms-based suppression, a memory the user just explicitly searched for would remain suppressed — a regression vs pre-Tier-1 semantics where zeroing the turn field cleared the only suppression mechanism. Same fix applied to `memory_promote` at line ~1939, which carries identical positive-signal semantics. F3: `memory_explain_rank` debug output now reports `suppressedUntilMs` instead of the retired `suppressedUntilTurn` (which is always 0 after Tier 1 touches a memory). Cosmetic but eliminates a misleading metric. F4: extracted Tier 1 patch logic to `src/auto-recall-tier1.ts` as pure functions (`isSuppressed`, `computeTier1Patch`, plus default constants). - index.ts auto-recall path now imports and calls these helpers, collapsing ~85 lines of inline logic into a single computeTier1Patch call site. - test/tier1-counters.test.mjs and test/smart-memory-lifecycle.mjs now import the same helpers via jiti, replacing the local mirrored copies the reviewer flagged. Tests now drive the production code path directly, so any future drift surfaces immediately. EF1 (test suite failure) is the pre-existing `smart-extractor-branches.mjs` flake on master, verified against upstream/master CI runs. Out of scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed in F2 (manual recall no longer clears active suppression) — fixed. F3 (debug output reports retired field) — fixed. F4 (tests duplicate production logic) — refactored. Pulled the Tier 1 logic out of
EF1 (full test suite failing) — not caused by this PR. The failure is Open questions noted:
Verification: |
|
CI rerun shows two failures, both inherited from master (introduced by #716, merged ~3 hours ago):
Verified the same failures on master at No new code from this PR is implicated:
Will be unblocked when master fixes #716. Happy to post the one-line |
rwmjhb
left a comment
There was a problem hiding this comment.
PR #712 Review: fix(auto-recall): Tier 1 memory counter fix (ms-based suppression + lazy-heal)
Verdict: REQUEST-CHANGES | 6 rounds completed | Value: 64% | Size: XL | Author: raw34
Value Assessment
Problem: Auto-recall can leave memory access and suppression counters in a broken state, causing recalled memories to stop appearing or fail to reinforce downstream decay/access tracking. The PR replaces session-turn suppression with millisecond-based suppression, lazily heals legacy counter pollution, and increments access metadata on Tier 1 injection.
| Dimension | Assessment |
|---|---|
| Value Score | 64% |
| Value Verdict | review |
| Issue Linked | true |
| Project Aligned | true |
| Duplicate | false |
| AI Slop Score | 0/6 |
| User Impact | high |
| Urgency | medium |
Scope Drift: 2 flag(s)
- openclaw.plugin.json exposes new public configuration for decay and suppression duration; this may be justified operationally, but it creates long-term API surface beyond the narrow bug fix
- test/smart-memory-lifecycle.mjs appends a helper-level assertion after the existing lifecycle pass message, which is slightly awkward test structure but still related to the fix
Open Questions:
- Linked issue #633 has no labels, assignment, or maintainer comments in the provided context, so maintainer acknowledgement cannot be confirmed.
- Maintainers should confirm whether PR #712 should land independently of PR #597 or wait for the broader Proposal A recall-governance direction.
- Maintainers should decide whether autoRecallBadRecallDecayMs and autoRecallSuppressionDurationMs should remain public config or start as internal constants.
- The full suite failure appears inherited from master according to the timeline, but CI should be green or the inherited failure should be clearly documented before merge.
Summary
Auto-recall can leave memory access and suppression counters in a broken state, causing recalled memories to stop appearing or fail to reinforce downstream decay/access tracking. The PR replaces session-turn suppression with millisecond-based suppression, lazily heals legacy counter pollution, and increments access metadata on Tier 1 injection.
Evaluation Signals
| Signal | Value |
|---|---|
| Blockers | 0 |
| Warnings | 0 |
| PR Size | XL |
| Verdict Floor | approve |
| Risk Level | high |
| Value Model | codex |
| Primary Model | codex |
| Adversarial Model | claude |
Must Fix
- F1: New Tier 1 config fields are dropped during config parsing
Nice to Have
- EF1: Full test suite fails in smart extraction branch coverage
- MR1: Read path silently retires legacy suppressed_until_turn — currently-suppressed memories surface immediately on deploy
- MR2: buildSmartMetadata loses lazy-heal sentinel if any persistence layer maps undefined → null
- MR3: TIER1_BAD_RECALL_SUPPRESSION_THRESHOLD remains hardcoded while companion knobs become public config
Recommended Action
Good direction — problem is worth solving. Author should address must-fix findings, then this is ready to merge.
Reviewed at 2026-05-05T10:30:44Z | 6 rounds | Value: codex | Primary: codex | Adversarial: claude
…+ MR2 + MR3) F1 (must-fix): parsePluginConfig was dropping the two new Tier 1 config fields. The interface declared them and index.ts read them, but the return-object construction never copied them through, so user config was always silently discarded and the runtime always saw undefined. - Add a parseNonNegativeInt helper (parsePositiveInt rejects 0, but 0 is a meaningful sentinel for both fields: disable decay / collapse suppression to a no-op). - Wire autoRecallBadRecallDecayMs and autoRecallSuppressionDurationMs through parsePluginConfig. - Add 5 regression tests in tier1-counters.test.mjs covering: field propagation, the 0 sentinel, undefined fallback, and negative-input rejection. MR2 (nice-to-have): parseSmartMetadata used `!== undefined` to detect the lazy-heal sentinel. If any persistence layer ever round-trips undefined → null on the metadata JSON, the sentinel was lost (null fell through to clampCount → 0, marking the memory as "Tier 1 touched" incorrectly). - Switch to `!= null` (covers both undefined and null) in both parseSmartMetadata and the patch path in buildSmartMetadata. - Add a regression test for the null round-trip case. MR3 (nice-to-have): document why TIER1_BAD_RECALL_SUPPRESSION_THRESHOLD is a constant while companion knobs are public config — "3 strikes" is a behavioral design choice; decay/duration are ops tuning. Note the existing `minRepeated` opt seam is available if tuning is ever needed. EF1: pre-existing master flake, documented in earlier comment. MR1 (legacy turn-based suppression at deploy): deferred — old turn numbers were never deploy-stable in the pre-Tier-1 design (turn counter resets per session), so "currently-suppressed memories" do not have well-defined cross-deploy semantics. Lazy-heal is the intended migration. Will document in the follow-up PR comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Second-round review addressed in F1 (must-fix) — parsePluginConfig dropped new Tier 1 config fields: confirmed and fixed. The MR2 (nice-to-have) — parseSmartMetadata loses lazy-heal sentinel on MR3 (nice-to-have) — threshold hardcoded while knobs are public config: documented the rationale in MR1 (nice-to-have) — read path silently retires EF1 — full test suite failure: still the pre-existing master flake ( Verification: |
Pre-fix the smart-memory-lifecycle test printed "OK: passed" before the appended Tier 1 integration assertion ran, so a Tier 1 failure would surface as "OK" followed by an AssertionError — confusing under log skimming. Move the print to the end and reword it to reflect both the legacy lifecycle and Tier 1 integration coverage. Caught by self-audit before the next reviewer round; reviewer flagged the same as a Scope Drift signal in the previous review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Rewrites the Tier 1 auto-recall lifecycle to fix counter bugs that blocked
decay-engine, access-tracker reinforcement, and future tier-manager promotion.
See Issue #633 / RFC #445.
Key changes
Schema (
src/smart-metadata.ts)suppressed_until_ms?: numberfield with presence semantics(undefined = never touched by Tier 1; 0/number = touched, with optional active suppression)
Config (
openclaw.plugin.json)autoRecallBadRecallDecayMs— Option C decay window, default 24hautoRecallSuppressionDurationMs— suppression duration, default 30minAuto-recall inline patch (
index.ts)access_countandlast_accessed_aton injection (the core fix)bad_recall_countif gap > decay window; clock-skew safesuppressed_until_ms, replacing turn-number suppressionsuppressed_until_turnto prevent stale-number leaksGovernance filter (
index.ts)suppressed_until_turnvscurrentTurnwith absolute-time checkagainst
suppressed_until_ms. Gateway restart no longer creates false"just expired" windows.
Out of scope
staleInjectedjudgment (Proposal A / PR feat(proposal-a): Phase 1 recall governance (Issue #569) #597 territory)memory_recallpathTest plan
test/tier1-counters.test.mjs(22 tests) — asserts access_count0→1, suppressed_until_ms semantics, decay, lazy-heal idempotence
npm test,ci-test-manifest.mjs, andverify-ci-test-manifest.mjsbaselinetest/smart-memory-lifecycle.mjs— Tier 1 access_count integration assertiontest/smart-metadata-v2.mjs— schema regressiontest/clawteam-scope.test.mjs+per-agent-auto-recall+scope-access-undefined(65 tests) — scope regression