fix(auto-recall): Tier 1 memory counter fix (ms-based suppression + lazy-heal) by raw34 · Pull Request #712 · CortexReach/memory-lancedb-pro

raw34 · 2026-04-28T10:33:24Z

Summary

Rewrites the Tier 1 auto-recall lifecycle to fix counter bugs that blocked
decay-engine, access-tracker reinforcement, and future tier-manager promotion.
See Issue #633 / RFC #445.

Key changes

Schema (src/smart-metadata.ts)

Adds optional suppressed_until_ms?: number field with presence semantics
(undefined = never touched by Tier 1; 0/number = touched, with optional active suppression)

Config (openclaw.plugin.json)

autoRecallBadRecallDecayMs — Option C decay window, default 24h
autoRecallSuppressionDurationMs — suppression duration, default 30min

Auto-recall inline patch (index.ts)

Always increment access_count and last_accessed_at on injection (the core fix)
Lazy-heal legacy pollution on first Tier 1 touch
Option C decay: reset bad_recall_count if gap > decay window; clock-skew safe
Write ms-based suppressed_until_ms, replacing turn-number suppression
Always zero suppressed_until_turn to prevent stale-number leaks

Governance filter (index.ts)

Replace suppressed_until_turn vs currentTurn with absolute-time check
against suppressed_until_ms. Gateway restart no longer creates false
"just expired" windows.

Out of scope

staleInjected judgment (Proposal A / PR feat(proposal-a): Phase 1 recall governance (Issue #569) #597 territory)
Manual memory_recall path
Tier-manager activation

Test plan

New test/tier1-counters.test.mjs (22 tests) — asserts access_count
0→1, suppressed_until_ms semantics, decay, lazy-heal idempotence
Registered in npm test, ci-test-manifest.mjs, and
verify-ci-test-manifest.mjs baseline
test/smart-memory-lifecycle.mjs — Tier 1 access_count integration assertion
test/smart-metadata-v2.mjs — schema regression
test/clawteam-scope.test.mjs + per-agent-auto-recall + scope-access-undefined (65 tests) — scope regression

rwmjhb

Review action: COMMENT

Thanks for the PR. GitHub currently reports this branch as not mergeable (mergeable=CONFLICTING, merge_state_status=DIRTY), so I am deferring deep review until the diff can be reviewed against the current base.

Please rebase onto the latest base branch, resolve the merge conflicts, and push the updated branch. Once it is cleanly mergeable again, I will re-run the full review on the updated diff.

…azy-heal) Rewrite the Tier 1 auto-recall lifecycle to fix counter bugs that blocked decay-engine, access-tracker reinforcement, and future tier-manager promotion. See Issue CortexReach#633 / RFC CortexReach#445. Schema: - Add optional `suppressed_until_ms?: number` field to SmartMemoryMetadata. parseSmartMetadata preserves the `undefined` sentinel for "never touched by Tier 1" vs "touched, no active suppression". Includes guard comment + NaN test to prevent accidental clampCount refactors. Config: - Add `autoRecallBadRecallDecayMs` (default 24h, Option C decay window) and `autoRecallSuppressionDurationMs` (default 30min). Both intentionally omit a `maximum` bound (documented via $comment). Auto-recall inline patch: - Increment access_count and last_accessed_at on every injection — the core fix unblocking downstream lifecycle components. - Lazy-heal legacy pollution: memories with no suppressed_until_ms AND non-zero bad_recall_count / suppressed_until_turn get reset on first Tier 1 touch. - Option C decay: if gap since last_injected_at exceeds badRecallDecayMs, reset bad_recall_count to 0 before stale-injection increment. Negative gaps from clock skew fall through as "no decay" (conservative). - Write ms-based suppressed_until_ms (replacing turn-number suppression); gateway restart no longer creates false "just expired" windows. - Always zero legacy suppressed_until_turn to prevent stale-number leaks. Governance filter: - Replace `suppressed_until_turn` vs `currentTurn` check with absolute-time comparison against `suppressed_until_ms`. Suppression anchor is now wall-clock, not session-local. Out of scope (intentionally unchanged): - staleInjected judgment (Proposal A / PR CortexReach#597 territory) - Manual memory_recall path - Tier-manager activation Tests: - New test/tier1-counters.test.mjs asserts access_count increments 0->1 via the shared parseSmartMetadata plumbing. - Registered in npm test chain, ci-test-manifest, and verify-ci-test-manifest baseline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rwmjhb

PR #712 Review: fix(auto-recall): Tier 1 memory counter fix (ms-based suppression + lazy-heal)

Verdict: REQUEST-CHANGES | 6 rounds completed | Value: 67% | Size: LARGE | Author: raw34

Value Assessment

Problem: Auto-recall memories can accumulate broken recall/suppression counters and fail to update access counters, which undermines recall availability and downstream reinforcement/decay behavior. This PR attempts to move suppression from session turn numbers to absolute millisecond timestamps, lazily heal legacy counter pollution, and increment access metadata on injection.

Dimension	Assessment
Value Score	67%
Value Verdict	review
Issue Linked	true
Project Aligned	true
Duplicate	false
AI Slop Score	1/6
User Impact	high
Urgency	medium

Scope Drift: 1 flag(s)

test/smart-memory-lifecycle.mjs and test/tier1-counters.test.mjs duplicate the production patch logic in local helpers rather than exercising the actual index.ts auto-recall injection path

AI Slop Signals:

The new tests mirror production logic in local computeTier1Patch/isSuppressed helpers and include a comment saying index.ts wiring is verified by visual inspection, so part of the claimed fix is not directly tested.

Open Questions:

The provided linked issue data has no labels, assignment, or maintainer comments, so maintainer acknowledgement of #633 cannot be confirmed from this context.
PR #597 overlaps with recall governance and last_confirmed_use_at behavior; maintainers should confirm whether this Tier 1 ms-suppression approach should land independently or after Proposal A.
The full suite currently fails in test/smart-extractor-branches.mjs; that may be unrelated, but it must be resolved or classified before merge.
Maintainers should decide whether autoRecallBadRecallDecayMs and autoRecallSuppressionDurationMs need to be public config or could remain internal constants until behavior stabilizes.

Summary

Auto-recall memories can accumulate broken recall/suppression counters and fail to update access counters, which undermines recall availability and downstream reinforcement/decay behavior. This PR attempts to move suppression from session turn numbers to absolute millisecond timestamps, lazily heal legacy counter pollution, and increment access metadata on injection.

Evaluation Signals

Signal	Value
Blockers	0
Warnings	0
PR Size	LARGE
Verdict Floor	request-changes
Risk Level	high
Value Model	codex
Primary Model	codex
Adversarial Model	claude

Must Fix

EF1: Full test suite is failing

Nice to Have

F2: Manual recall no longer clears active auto-recall suppression
F3: Debug output reports the retired suppression field
F4: New counter tests duplicate production logic instead of exercising it

Recommended Action

Good direction — problem is worth solving. Author should address must-fix findings, then this is ready to merge.

Reviewed at 2026-05-05T03:31:29Z | 6 rounds | Value: codex | Primary: codex | Adversarial: claude

Per rwmjhb's CHANGES_REQUESTED review (2026-05-05): F2: manual `memory_recall` now clears `suppressed_until_ms` alongside `suppressed_until_turn` in src/tools.ts:594-614. Without this, after the governance check moved to ms-based suppression, a memory the user just explicitly searched for would remain suppressed — a regression vs pre-Tier-1 semantics where zeroing the turn field cleared the only suppression mechanism. Same fix applied to `memory_promote` at line ~1939, which carries identical positive-signal semantics. F3: `memory_explain_rank` debug output now reports `suppressedUntilMs` instead of the retired `suppressedUntilTurn` (which is always 0 after Tier 1 touches a memory). Cosmetic but eliminates a misleading metric. F4: extracted Tier 1 patch logic to `src/auto-recall-tier1.ts` as pure functions (`isSuppressed`, `computeTier1Patch`, plus default constants). - index.ts auto-recall path now imports and calls these helpers, collapsing ~85 lines of inline logic into a single computeTier1Patch call site. - test/tier1-counters.test.mjs and test/smart-memory-lifecycle.mjs now import the same helpers via jiti, replacing the local mirrored copies the reviewer flagged. Tests now drive the production code path directly, so any future drift surfaces immediately. EF1 (test suite failure) is the pre-existing `smart-extractor-branches.mjs` flake on master, verified against upstream/master CI runs. Out of scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

raw34 · 2026-05-05T06:15:37Z

Addressed in 97cffbe.

F2 (manual recall no longer clears active suppression) — fixed. memory_recall and memory_promote patches now zero suppressed_until_ms alongside suppressed_until_turn, restoring pre-Tier-1 semantics where a positive recall signal clears active suppression. (src/tools.ts:600-614, src/tools.ts:1933-1947)

F3 (debug output reports retired field) — fixed. memory_explain_rank now reports suppressedUntilMs=… instead of the retired suppressedUntilTurn (which is always 0 after Tier 1 touch). (src/tools.ts:2201)

F4 (tests duplicate production logic) — refactored. Pulled the Tier 1 logic out of index.ts into src/auto-recall-tier1.ts as two pure functions:

isSuppressed(meta, nowMs) — used by the governance filter
computeTier1Patch(meta, opts) — used by the auto-recall injection patch loop

test/tier1-counters.test.mjs and test/smart-memory-lifecycle.mjs now import these same helpers via jiti rather than maintaining local mirrors. The 22 unit tests + lifecycle integration assertion now drive the production code path directly, so any future drift surfaces immediately. The "verified by visual inspection" comment is gone.

EF1 (full test suite failing) — not caused by this PR. The failure is test/smart-extractor-branches.mjs (Smart extraction should trigger on turn 2 with cumulative count >= 2 + Failed to generate embedding from 127.0.0.1:XXXXX: Connection error), which is a pre-existing flake on master. Latest master CI run 25315749827 (commit 47b635d, the base this PR rebased onto) fails on the identical assertion, alongside update-consistency-lancedb (also flaky). The failure path is in smart-extractor / embedder code, which this PR does not touch. Out of scope here; would suggest a separate issue if the env-coupling needs hardening.

Open questions noted:

PR feat(proposal-a): Phase 1 recall governance (Issue #569) #597 overlap: the staleInjected judgment is preserved verbatim and now lives as a private helper in src/auto-recall-tier1.ts, with a comment pointing to PR feat(proposal-a): Phase 1 recall governance (Issue #569) #597 / Proposal A as the owner of any future changes. Tier 1's seam is stable.
Public config: keeping autoRecallBadRecallDecayMs and autoRecallSuppressionDurationMs user-configurable seems right — 24h decay and 30min suppression are operationally tunable and worth exposing for ops. Defaults remain conservative.

Verification: tier1-counters (22/22), smart-memory-lifecycle (incl. integration assertion), core-regression (22/22 excluding the master flake), packaging-and-workflow, storage-and-schema, cli-smoke all green locally.

raw34 · 2026-05-05T06:25:27Z

CI rerun shows two failures, both inherited from master (introduced by #716, merged ~3 hours ago):

packaging-and-workflow — Error: unexpected manifest entry: test/issue606_sdk-migration.test.mjs. PR fix(sdk): migrate Bug 2 to api.runtime.agent (Issue #606) #716 added the file to scripts/ci-test-manifest.mjs (line 63) but did not add the matching entry to scripts/verify-ci-test-manifest.mjs EXPECTED_BASELINE, so the lockstep check fails.
core-regression — test/issue606_sdk-migration.test.mjs exited 1 with Cannot find module 'src/store.js' imported from index.ts and F1: circuit breaker opens… not-ok. The test itself is failing on master.

Verified the same failures on master at 09965e3: CI run 25359252529 has the identical two-line failure pattern. My PR is auto-merged with master before CI runs (default actions/checkout@v4 behavior on pull_request events), which is why the master-side breakage surfaces here.

No new code from this PR is implicated:

git merge --no-ff upstream/master against 97cffbe resolves cleanly with no conflicts.
All Tier 1 helpers and tests still pass locally — tier1-counters (22/22), smart-memory-lifecycle, core-regression minus the master-broken issue606_sdk-migration test, packaging-and-workflow minus the master EXPECTED_BASELINE mismatch.

Will be unblocked when master fixes #716. Happy to post the one-line EXPECTED_BASELINE patch as a separate PR if that helps.

rwmjhb

PR #712 Review: fix(auto-recall): Tier 1 memory counter fix (ms-based suppression + lazy-heal)

Verdict: REQUEST-CHANGES | 6 rounds completed | Value: 64% | Size: XL | Author: raw34

Value Assessment

Problem: Auto-recall can leave memory access and suppression counters in a broken state, causing recalled memories to stop appearing or fail to reinforce downstream decay/access tracking. The PR replaces session-turn suppression with millisecond-based suppression, lazily heals legacy counter pollution, and increments access metadata on Tier 1 injection.

Dimension	Assessment
Value Score	64%
Value Verdict	review
Issue Linked	true
Project Aligned	true
Duplicate	false
AI Slop Score	0/6
User Impact	high
Urgency	medium

Scope Drift: 2 flag(s)

openclaw.plugin.json exposes new public configuration for decay and suppression duration; this may be justified operationally, but it creates long-term API surface beyond the narrow bug fix
test/smart-memory-lifecycle.mjs appends a helper-level assertion after the existing lifecycle pass message, which is slightly awkward test structure but still related to the fix

Open Questions:

Linked issue #633 has no labels, assignment, or maintainer comments in the provided context, so maintainer acknowledgement cannot be confirmed.
Maintainers should confirm whether PR #712 should land independently of PR #597 or wait for the broader Proposal A recall-governance direction.
Maintainers should decide whether autoRecallBadRecallDecayMs and autoRecallSuppressionDurationMs should remain public config or start as internal constants.
The full suite failure appears inherited from master according to the timeline, but CI should be green or the inherited failure should be clearly documented before merge.

Summary

Auto-recall can leave memory access and suppression counters in a broken state, causing recalled memories to stop appearing or fail to reinforce downstream decay/access tracking. The PR replaces session-turn suppression with millisecond-based suppression, lazily heals legacy counter pollution, and increments access metadata on Tier 1 injection.

Evaluation Signals

Signal	Value
Blockers	0
Warnings	0
PR Size	XL
Verdict Floor	approve
Risk Level	high
Value Model	codex
Primary Model	codex
Adversarial Model	claude

Must Fix

F1: New Tier 1 config fields are dropped during config parsing

Nice to Have

EF1: Full test suite fails in smart extraction branch coverage
MR1: Read path silently retires legacy suppressed_until_turn — currently-suppressed memories surface immediately on deploy
MR2: buildSmartMetadata loses lazy-heal sentinel if any persistence layer maps undefined → null
MR3: TIER1_BAD_RECALL_SUPPRESSION_THRESHOLD remains hardcoded while companion knobs become public config

Recommended Action

Good direction — problem is worth solving. Author should address must-fix findings, then this is ready to merge.

Reviewed at 2026-05-05T10:30:44Z | 6 rounds | Value: codex | Primary: codex | Adversarial: claude

…+ MR2 + MR3) F1 (must-fix): parsePluginConfig was dropping the two new Tier 1 config fields. The interface declared them and index.ts read them, but the return-object construction never copied them through, so user config was always silently discarded and the runtime always saw undefined. - Add a parseNonNegativeInt helper (parsePositiveInt rejects 0, but 0 is a meaningful sentinel for both fields: disable decay / collapse suppression to a no-op). - Wire autoRecallBadRecallDecayMs and autoRecallSuppressionDurationMs through parsePluginConfig. - Add 5 regression tests in tier1-counters.test.mjs covering: field propagation, the 0 sentinel, undefined fallback, and negative-input rejection. MR2 (nice-to-have): parseSmartMetadata used `!== undefined` to detect the lazy-heal sentinel. If any persistence layer ever round-trips undefined → null on the metadata JSON, the sentinel was lost (null fell through to clampCount → 0, marking the memory as "Tier 1 touched" incorrectly). - Switch to `!= null` (covers both undefined and null) in both parseSmartMetadata and the patch path in buildSmartMetadata. - Add a regression test for the null round-trip case. MR3 (nice-to-have): document why TIER1_BAD_RECALL_SUPPRESSION_THRESHOLD is a constant while companion knobs are public config — "3 strikes" is a behavioral design choice; decay/duration are ops tuning. Note the existing `minRepeated` opt seam is available if tuning is ever needed. EF1: pre-existing master flake, documented in earlier comment. MR1 (legacy turn-based suppression at deploy): deferred — old turn numbers were never deploy-stable in the pre-Tier-1 design (turn counter resets per session), so "currently-suppressed memories" do not have well-defined cross-deploy semantics. Lazy-heal is the intended migration. Will document in the follow-up PR comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

raw34 · 2026-05-05T11:42:11Z

Second-round review addressed in 49693bc.

F1 (must-fix) — parsePluginConfig dropped new Tier 1 config fields: confirmed and fixed. The PluginConfig interface declared the fields and index.ts read them, but parsePluginConfig's return-object construction never copied them through, so user config was silently discarded and the runtime always saw undefined. Added a parseNonNegativeInt helper (the existing parsePositiveInt rejects 0, but 0 is a meaningful sentinel for both fields — disable decay / collapse suppression to a no-op) and wired both fields through. 5 new regression tests in tier1-counters.test.mjs cover: field propagation, the 0 sentinel, undefined fallback, and negative-input rejection.

MR2 (nice-to-have) — parseSmartMetadata loses lazy-heal sentinel on null: confirmed and fixed. parsed.suppressed_until_ms !== undefined failed to catch null, which falls through to clampCount(null, 0) === 0 and incorrectly marks the memory as "Tier 1 touched". Switched both the parse path and buildSmartMetadata's patch path to != null. Added a regression test for the null round-trip case.

MR3 (nice-to-have) — threshold hardcoded while knobs are public config: documented the rationale in src/auto-recall-tier1.ts. The "3 strikes" rule is a behavioral design choice that should hold across deployments; decay window and suppression duration are operational tuning parameters that ops may legitimately tune. The minRepeated opt on computeTier1Patch already provides the seam if real tuning need surfaces.

MR1 (nice-to-have) — read path silently retires suppressed_until_turn at deploy: I'd push back on this one. Pre-Tier-1 turn-based suppression was never deploy-stable — the turn counter is per-session and resets at every gateway restart, so a memory with suppressed_until_turn=999 from a previous session was already meaningless against a new session's currentTurn=1 (it would over-suppress for ~999 turns of the new session, an accidental behavior). Post-Tier-1 the legacy field is correctly ignored on read; lazy-heal is the migration path on next injection. The transition exposes some memories that were accidentally over-suppressed by the old logic, which is a better semantic, not a regression. Happy to add a short migration note in the PR description if maintainers want it documented for users.

EF1 — full test suite failure: still the pre-existing master flake (smart-extractor-branches.mjs + issue606_sdk-migration.test.mjs introduced by #716). Noted in earlier comment.

Verification: tier1-counters (28/28 — was 22, +6 new for F1/MR2), smart-metadata-v2, clawteam-scope + per-agent-auto-recall + scope-access-undefined (65/65), core-regression (22/22 minus master flake) all green locally.

Pre-fix the smart-memory-lifecycle test printed "OK: passed" before the appended Tier 1 integration assertion ran, so a Tier 1 failure would surface as "OK" followed by an AssertionError — confusing under log skimming. Move the print to the end and reword it to reflect both the legacy lifecycle and Tier 1 integration coverage. Caught by self-audit before the next reviewer round; reviewer flagged the same as a Scope Drift signal in the previous review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rwmjhb reviewed May 4, 2026

View reviewed changes

raw34 force-pushed the fix/tier1-memory-counters branch from 0212126 to 0841c4a Compare May 4, 2026 12:10

rwmjhb requested changes May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(auto-recall): Tier 1 memory counter fix (ms-based suppression + lazy-heal)#712

fix(auto-recall): Tier 1 memory counter fix (ms-based suppression + lazy-heal)#712
raw34 wants to merge 4 commits intoCortexReach:masterfrom
raw34:fix/tier1-memory-counters

raw34 commented Apr 28, 2026

Uh oh!

rwmjhb left a comment

Uh oh!

rwmjhb left a comment

Uh oh!

raw34 commented May 5, 2026

Uh oh!

raw34 commented May 5, 2026

Uh oh!

rwmjhb left a comment

Uh oh!

raw34 commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

raw34 commented Apr 28, 2026

Summary

Key changes

Out of scope

Test plan

Uh oh!

rwmjhb left a comment

Choose a reason for hiding this comment

Uh oh!

rwmjhb left a comment

Choose a reason for hiding this comment

PR #712 Review: fix(auto-recall): Tier 1 memory counter fix (ms-based suppression + lazy-heal)

Value Assessment

Summary

Evaluation Signals

Must Fix

Nice to Have

Recommended Action

Uh oh!

raw34 commented May 5, 2026

Uh oh!

raw34 commented May 5, 2026

Uh oh!

rwmjhb left a comment

Choose a reason for hiding this comment

PR #712 Review: fix(auto-recall): Tier 1 memory counter fix (ms-based suppression + lazy-heal)

Value Assessment

Summary

Evaluation Signals

Must Fix

Nice to Have

Recommended Action

Uh oh!

raw34 commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants