fix: auto-capture cumulative turn counting for smart extraction (issue #417) by jlin53882 · Pull Request #518 · CortexReach/memory-lancedb-pro

jlin53882 · 2026-04-04T10:40:38Z

Summary

Fixes the cumulative turn counting for smart extraction in auto-capture. Resolves issue #417.

Problem

With extractMinMessages: 2 + smartExtraction: true, single-turn DM conversations (1 user message per agent_end) always fall through to regex fallback, writing dirty data (l0_abstract == text, no LLM distillation).

Root cause: autoCaptureSeenTextCount was being overwritten per-event (always set to current event's message count = 1), never accumulating. Two fix paths identified:

Path A: autoCaptureSeenTextCount diffing (fix: accumulate instead of overwrite)
Path B: buildAutoCaptureConversationKeyFromIngress returned null for DM (no conversationId), so pendingIngressTexts was never written (fix: fallback to channel for DM)

Changes

Fix 1: `buildAutoCaptureConversationKeyFromIngress` (Path B)

DM: conversationId is undefined → returns channel instead of null. Now matches the key extracted by regex in agent_end.

Fix 2: `autoCaptureSeenTextCount` cumulative counting (Path A)

Changed from set(sessionKey, eligibleTexts.length) (overwrite) to set(sessionKey, currentCumulativeCount) (accumulate).

Fix 3: Smart extraction threshold uses cumulative count

currentCumulativeCount >= minMessages instead of cleanTexts.length >= minMessages.

Fix 4: `isExplicitRememberCommand` guard

Preserves explicit remember command behavior in DM context.

Fix 5: `extractMinMessages` cap

Math.min(config.extractMinMessages ?? 4, 100) — prevents misconfiguration.

Fix 6: `MAX_MESSAGE_LENGTH = 5000` guard

Prevents OOM from super-long messages in pendingIngressTexts.

Semantic Change (Needs Discussion)

The fix changes extractMinMessages semantics from "per-event message count" to "cumulative conversation turns". This aligns with the design intent (AliceLJY confirmed this is the correct direction for beta.11), but existing users who configured extractMinMessages based on the old semantics may see different behavior.

Open Questions for Maintainer Review

Q1: `extractMinMessages` semantic change acceptable?

Before: extractMinMessages=N = each agent_end event needs N messages
After: extractMinMessages=N = smart extraction triggers at turn N

AliceLJY confirmed this is the correct direction. Is this semantic change acceptable as a bug fix, or should it be a separate config option?

Q2: `isExplicitRememberCommand` behavior after Fix 3

When pendingIngressTexts.length > 0 (multi-turn), texts.length is typically > 1, so texts.length === 1 && isExplicitRememberCommand(...) guard rarely triggers. The fix preserves this by checking lastPending fallback. Is this the intended behavior?

Q3: Window size for cumulative counting

The rolling window for pendingIngressTexts is -6 messages. For DM with extractMinMessages=2, this is sufficient. But if extractMinMessages is set to a very large value (e.g., 50), the window may not hold enough history. Should there be a dynamic adjustment?

…CortexReach#417) - Fix #1: buildAutoCaptureConversationKeyFromIngress — DM fallback to channelId (fixes pendingIngressTexts never being written for Discord DM) - Fix #2: cumulative counting — autoCaptureSeenTextCount accumulates, not overwrites (fixes eligibleTexts.length always 1 for DM, extractMinMessages never satisfied) - Fix #3: REPLACE vs APPEND — use pendingIngressTexts as-is when present (avoids deduplication issues from text appearing in both sources) - Fix #5: isExplicitRememberCommand guard with lastPending fallback (preserves explicit remember command behavior in DM context) - Fix #6: Math.min cap on extractMinMessages (max 100) — prevents misconfiguration - Fix #7: MAX_MESSAGE_LENGTH=5000 guard in message_received hook - Smart extraction threshold now uses currentCumulativeCount (turn count) instead of cleanTexts.length (per-event message count) - Debug logs updated to show cumulative count context All 29 test suites pass. Based on official latest (5669b08).

chatgpt-codex-connector · 2026-04-04T10:40:44Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

…turn counting test + changelog - Fix #1: buildAutoCaptureConversationKeyFromIngress DM fallback - Fix #2: currentCumulativeCount (cumulative per-event counting) - Fix #3: REPLACE vs APPEND + cum count threshold for smart extraction - Fix #4: remove pendingIngressTexts.delete() - Fix #5: isExplicitRememberCommand lastPending guard - Fix #6: Math.min extractMinMessages cap (max 100) - Fix #7: MAX_MESSAGE_LENGTH=5000 guard - Add test: 2 sequential agent_end events with extractMinMessages=2 - Add changelog: Unreleased section with issue details

… change label

rwmjhb

Review: fix: auto-capture cumulative turn counting for smart extraction (#417)

问题定位准确——单轮 DM 因为 autoCaptureSeenTextCount 被每次覆写而永远触发 regex fallback，导致脏数据。但实现有几个需要关注的点：

Must Fix

语义混淆: autoCaptureSeenTextCount 现在混合了 transcript length 和 cumulative-turn 两种语义，currentCumulativeCount 在整个 session 生命周期内单调递增无 reset——超过 turn N 之后每轮都会触发 smart extraction。
Build 失败: TypeScript 编译不过，需要先修。

Questions

autoCapturePendingIngressTexts.delete(conversationKey) 被移除但 PR description 没说明——是有意为之吗？如果是，什么机制防止 stale pending texts 被重复消费？
isExplicitRememberCommand guard 在 multi-turn DM（这个 PR 的目标场景）下不可达——是设计意图还是 bug？
新测试的 createMockApi 传了第 5 个参数但函数签名不接受。

…move dead isExplicitRememberCommand guard (PR CortexReach#518 review fixes)

jlin53882 · 2026-04-05T15:08:20Z

Phase 2: Design Questions for Maintainer Input

Phase 1 cleanup (PR #534) is ready — fixes createMockApi interface and removes dead code.

Before Phase 2 (semantic fixes), we need maintainer input on these 5 questions:

Q1: Smart Extraction Re-trigger Policy

Issue: currentCumulativeCount is monotonic (no reset). Once it crosses extractMinMessages=N, every subsequent turn triggers smart extraction.

Current behavior (with Phase 1):

Turn 1: cumulative=1, skip
Turn 2: cumulative=2, trigger ✓
Turn 3: cumulative=3, trigger again ⚠️
Turn 4: cumulative=4, trigger again ⚠️

Is this the intended behavior? Should there be a session-level once-only flag?

Q2: REPLACE vs APPEND Strategy

PR #518 chose REPLACE (newTexts = pendingIngressTexts) to avoid deduplication issues when the same text appears in both pendingIngressTexts and eligibleTexts.

Alternative: Keep APPEND but deduplicate before joining.

Which strategy is preferred?

Q3: Stale `pendingIngressTexts` Prevention

Issue: autoCapturePendingIngressTexts.delete(conversationKey) was removed in PR #518, relying on REPLACE + slice(-6) to prevent stale data.

Concern: If agent_end fires without a preceding message_received (e.g., direct trigger, retry/replay), stale texts could be consumed.

Is there a mechanism to ensure message_received always precedes agent_end for a given conversationKey?

Q4: `extractMinMessages` Semantic Change

PR #518 changes semantics from "per-event message count" to "cumulative conversation turns".

This is a breaking change for users who configured extractMinMessages based on the old semantics.

Is this semantic change acceptable as a bug fix, or should it be a separate config option?

Q5: `isExplicitRememberCommand` Guard (Dead Code)

Phase 1 removes this guard as unreachable under REPLACE, but I want to confirm this is correct:

Original intent: Handle single "請記住..." messages without enough context, by appending priorRecentTexts.

Why unreachable now: Under REPLACE, texts = pendingIngressTexts (rolling window of up to 6 messages), so texts.length === 1 can never be true in multi-turn scenarios.

Should this behavior be preserved? If so, where should the context-enrichment logic live?

PR #534 (Phase 1): #534
PR #518 (main PR): #518

jlin53882 · 2026-04-06T14:14:45Z

This PR has been superseded by PR #549, which resolves all blocking concerns from the PR #534 review (Must Fix #1, #2, #3, and Minor item #4) and additionally corrects a placement bug in Fix #1 identified during adversarial review.

Summary of fixes in #549:

Fix [BUG] Ollama nomic-embed-text dimension mismatch (expected 768, got 192) #8: pendingIngressTexts.delete() — clears consumed pending texts
Fix fix: respect embedding.dimensions for Ollama nomic-embed-text #9: currentCumulativeCount reset moved inside success block — resets only on successful extraction (not on failure), preventing re-trigger loops
Fix CLI Commands 無效 #4: pluginConfigOverrides comment explaining embedding override behavior

Please review #549 instead.

…move dead isExplicitRememberCommand guard (PR CortexReach#518 review fixes)

…each#517 (issues CortexReach#417/CortexReach#518 belong in PR CortexReach#549)

…move dead isExplicitRememberCommand guard (PR CortexReach#518 review fixes)

…rsedes #518, #534) (#549) * fix: auto-capture cumulative turn counting for smart extraction (issue #417) - Fix #1: buildAutoCaptureConversationKeyFromIngress — DM fallback to channelId (fixes pendingIngressTexts never being written for Discord DM) - Fix #2: cumulative counting — autoCaptureSeenTextCount accumulates, not overwrites (fixes eligibleTexts.length always 1 for DM, extractMinMessages never satisfied) - Fix #3: REPLACE vs APPEND — use pendingIngressTexts as-is when present (avoids deduplication issues from text appearing in both sources) - Fix #5: isExplicitRememberCommand guard with lastPending fallback (preserves explicit remember command behavior in DM context) - Fix #6: Math.min cap on extractMinMessages (max 100) — prevents misconfiguration - Fix #7: MAX_MESSAGE_LENGTH=5000 guard in message_received hook - Smart extraction threshold now uses currentCumulativeCount (turn count) instead of cleanTexts.length (per-event message count) - Debug logs updated to show cumulative count context All 29 test suites pass. Based on official latest (5669b08). * fix: re-apply all 7 fixes for issue #417 + add cumulative turn counting test + changelog - Fix #1: buildAutoCaptureConversationKeyFromIngress DM fallback - Fix #2: currentCumulativeCount (cumulative per-event counting) - Fix #3: REPLACE vs APPEND + cum count threshold for smart extraction - Fix #4: remove pendingIngressTexts.delete() - Fix #5: isExplicitRememberCommand lastPending guard - Fix #6: Math.min extractMinMessages cap (max 100) - Fix #7: MAX_MESSAGE_LENGTH=5000 guard - Add test: 2 sequential agent_end events with extractMinMessages=2 - Add changelog: Unreleased section with issue details * docs: update changelog - add test file reference and improve breaking change label * fix: Phase 1 - createMockApi accepts pluginConfigOverrides param + remove dead isExplicitRememberCommand guard (PR #518 review fixes) * fix: resolve all Must Fix items from PR #534 review (issue #417) * fix: move currentCumulativeCount reset inside success block (Fix #9) * fix: add try-catch around extractAndPersist to prevent hook crash on extraction failure (Fix #10) * fix: clear pendingIngressTexts in catch block on extraction failure (Fix #10 extended) * fix: add conversationKey guard to Fix #8 + restore test comment * fix: Must Fix 1/2/5 from PR #549 review - counter reset always, newTexts counting, Fix#8 assertion * fix: Must Fix 1 revised - reset counter to previousSeenCount on all-dedup (reviewer suggestion) * fix: revert Must Fix #2 (eligibleTexts.length counting restored) - preserves extractMinMessages semantics * fix: correct test expectation - collected 1 not 2 text(s) after counter formula revert (e5b5e5b) * fix: replace throw in hook with safe return (Fix-Must5) * fix: remove unreachable conversationKey guard (Claude Code review) * fix(issue-417): skip regex fallback when all candidates skipped with no boundary texts (Fix-Must1b) * test(issue-417): add Fix-Must1b DM fallback regression test * fix(issue-417): F1 success block counter reset + rate limiter inside success path (rwmjhb review) * fix(issue-417): document intentional non-reset of counter after regex fallback * fix(issue-417): MR1 counter虛增 + MR2 cap不合理（Codex對抗式review實作） - MR1: currentCumulativeCount 改用 newTexts.length 而非 eligibleTexts.length，防止重複full-history payload導致counter虛增 - MR2: 抽出 AUTO_CAPTURE_PENDING_WINDOW=6 常數，讓 queue.slice(-6)、slice(-6)、Math.min(...,100) 三處共用同一常數，消除magic number並與threshold cap對齊 * test(issue-417): F5 counter reset success-path regression test 新增 runCounterResetSuccessScenario() 測試 Fix #9（counter 在成功提取後 reset）。 - Turn 1: cumulative=1 < 2, skip - Turn 2: cumulative=2 >= 2, trigger extraction, LLM returns SUCCESS -> Fix #9: counter resets to 0 - Turn 3: cumulative restarts from 0 -> +1 = 1 < 2, skip 關鍵 assertion： 1. LLM 只被 call 一次（turn 2 成功後 turn 3 不再 trigger） 2. Turn 2 成功 log 出現 3. Turn 3 觀察到 cumulative=1 < minMessages=2，正確 skip * fix(issue-417): 修復維護者review問題 - test mock schema + 移除runtime cap * fix(issue-417): below-threshold return + CHANGELOG sync (rwmjhb review fix) * fix(issue-417): remove stale [Fix #6] comment + fix CHANGELOG PR number * fix(issue-417): Issue2 export fn + Issue3 Fix#5 explicit remember guard + Issue2 unit test * test(issue-417): add R2 Stage 2 LLM dedup + R3 DM key fallback integration tests * fix(issue-417): correct misleading comment — counter uses newTexts.length not eligibleTexts.length Fixes rwmjhb Nice-to-have: comment at line ~2830 stated the counter uses eligibleTexts.length, but the actual code (since MR1 commit 2ac682d) uses newTexts.length. Updated comment to accurately describe the newTexts.length approach and explain why it is correct vs eligibleTexts.length. * fix(issue-417): MF1 explicit-remember prepend, MF3 counter based on texts.length * fix(issue-417): MF1 v2 - avoid lastPending duplicate in REPLACE mode * fix(issue-417): MF3 move let texts before counter; fix MF1 typo * fix(issue-417): MF1 v3 - includes() check; revert MF3 to newTexts.length * fix(issue-417-mustfixes): MF2 - move R2 dedup scenario to module scope * fix(pr549/issue-417): export buildAutoCaptureConversationKeyFromIngress, DM fallback, MAX_MESSAGE_LENGTH guard, cumulative counting, counter reset, try-catch handlers * fix(test): remove non-existent log assertion in multi-round scenario (Must-Fix 3) Root cause: line 501 asserted a log message that never existed in production code. The 'created [preferences]...' format does not exist in smart-extractor.ts or index.ts. Preserved the actual log assertions: 'merged [preferences]' and 'skipped [preferences]'. Also preserved entry content validation (entries[0].text). --------- Co-authored-by: James <james53882@users.noreply.github.com> Co-authored-by: OpenClaw Agent <agent@openclaw.ai> Co-authored-by: jlin53882 <jlin53882@users.noreply.github.com> Co-authored-by: OpenClaw Agent <agent@openclaw.dev>

jlin53882 mentioned this pull request Apr 4, 2026

Bug: extractMinMessages=2 + autoCaptureSeenTextCount 累积逻辑失效 → 所有单轮对话都掉入 regex fallback，污染全库为脏数据 #417

Closed

AliceLJY assigned rwmjhb Apr 4, 2026

James added 2 commits April 4, 2026 21:25

docs: update changelog - add test file reference and improve breaking…

3176106

… change label

rwmjhb requested changes Apr 5, 2026

View reviewed changes

jlin53882 added a commit to jlin53882/memory-lancedb-pro that referenced this pull request Apr 5, 2026

fix: Phase 1 - createMockApi accepts pluginConfigOverrides param + re…

f96d4de

…move dead isExplicitRememberCommand guard (PR CortexReach#518 review fixes)

jlin53882 mentioned this pull request Apr 5, 2026

fix: Phase 1 review fixes for PR #518 (issue #417) #534

Closed

jlin53882 mentioned this pull request Apr 6, 2026

fix: auto-capture smart extraction — issue #417 full resolution (supersedes #518, #534) #549

Merged

jlin53882 added a commit to jlin53882/memory-lancedb-pro that referenced this pull request Apr 7, 2026

fix: Phase 1 - createMockApi accepts pluginConfigOverrides param + re…

6a3a864

…move dead isExplicitRememberCommand guard (PR CortexReach#518 review fixes)

jlin53882 added a commit to jlin53882/memory-lancedb-pro that referenced this pull request Apr 8, 2026

fix: Phase 1 - createMockApi accepts pluginConfigOverrides param + re…

2af1e34

…move dead isExplicitRememberCommand guard (PR CortexReach#518 review fixes)

rwmjhb closed this Apr 9, 2026

jlin53882 added a commit to jlin53882/memory-lancedb-pro that referenced this pull request Apr 9, 2026

fix: Phase 1 - createMockApi accepts pluginConfigOverrides param + re…

7f83529

…move dead isExplicitRememberCommand guard (PR CortexReach#518 review fixes)

jlin53882 added a commit to jlin53882/memory-lancedb-pro that referenced this pull request Apr 16, 2026

fix: remove scope creep — revert auto-capture changes from PR CortexR…

219274d

…each#517 (issues CortexReach#417/CortexReach#518 belong in PR CortexReach#549)

jlin53882 mentioned this pull request Apr 16, 2026

fix(store): proper-lockfile retries + ECOMPROMISED graceful handling (#415) #517

Merged

jlin53882 added a commit to jlin53882/memory-lancedb-pro that referenced this pull request Apr 20, 2026

fix: Phase 1 - createMockApi accepts pluginConfigOverrides param + re…

d3a9ded

…move dead isExplicitRememberCommand guard (PR CortexReach#518 review fixes)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: auto-capture cumulative turn counting for smart extraction (issue #417)#518

fix: auto-capture cumulative turn counting for smart extraction (issue #417)#518
jlin53882 wants to merge 3 commits intoCortexReach:masterfrom
jlin53882:fix/issue-417-extract-min-messages

jlin53882 commented Apr 4, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 4, 2026

Uh oh!

rwmjhb left a comment

Uh oh!

jlin53882 commented Apr 5, 2026

Uh oh!

jlin53882 commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jlin53882 commented Apr 4, 2026

Summary

Problem

Changes

Fix 1: buildAutoCaptureConversationKeyFromIngress (Path B)

Fix 2: autoCaptureSeenTextCount cumulative counting (Path A)

Fix 3: Smart extraction threshold uses cumulative count

Fix 4: isExplicitRememberCommand guard

Fix 5: extractMinMessages cap

Fix 6: MAX_MESSAGE_LENGTH = 5000 guard

Semantic Change (Needs Discussion)

Open Questions for Maintainer Review

Q1: extractMinMessages semantic change acceptable?

Q2: isExplicitRememberCommand behavior after Fix 3

Q3: Window size for cumulative counting

Uh oh!

chatgpt-codex-connector Bot commented Apr 4, 2026

Uh oh!

rwmjhb left a comment

Choose a reason for hiding this comment

Review: fix: auto-capture cumulative turn counting for smart extraction (#417)

Must Fix

Questions

Uh oh!

jlin53882 commented Apr 5, 2026

Phase 2: Design Questions for Maintainer Input

Q1: Smart Extraction Re-trigger Policy

Q2: REPLACE vs APPEND Strategy

Q3: Stale pendingIngressTexts Prevention

Q4: extractMinMessages Semantic Change

Q5: isExplicitRememberCommand Guard (Dead Code)

Uh oh!

jlin53882 commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix 1: `buildAutoCaptureConversationKeyFromIngress` (Path B)

Fix 2: `autoCaptureSeenTextCount` cumulative counting (Path A)

Fix 4: `isExplicitRememberCommand` guard

Fix 5: `extractMinMessages` cap

Fix 6: `MAX_MESSAGE_LENGTH = 5000` guard

Q1: `extractMinMessages` semantic change acceptable?

Q2: `isExplicitRememberCommand` behavior after Fix 3

Q3: Stale `pendingIngressTexts` Prevention

Q4: `extractMinMessages` Semantic Change

Q5: `isExplicitRememberCommand` Guard (Dead Code)