Skip to content

fix: auto-capture cumulative turn counting for smart extraction (issue #417)#518

Closed
jlin53882 wants to merge 3 commits intoCortexReach:masterfrom
jlin53882:fix/issue-417-extract-min-messages
Closed

fix: auto-capture cumulative turn counting for smart extraction (issue #417)#518
jlin53882 wants to merge 3 commits intoCortexReach:masterfrom
jlin53882:fix/issue-417-extract-min-messages

Conversation

@jlin53882
Copy link
Copy Markdown
Contributor

Summary

Fixes the cumulative turn counting for smart extraction in auto-capture. Resolves issue #417.

Problem

With extractMinMessages: 2 + smartExtraction: true, single-turn DM conversations (1 user message per agent_end) always fall through to regex fallback, writing dirty data (l0_abstract == text, no LLM distillation).

Root cause: autoCaptureSeenTextCount was being overwritten per-event (always set to current event's message count = 1), never accumulating. Two fix paths identified:

  • Path A: autoCaptureSeenTextCount diffing (fix: accumulate instead of overwrite)
  • Path B: buildAutoCaptureConversationKeyFromIngress returned null for DM (no conversationId), so pendingIngressTexts was never written (fix: fallback to channel for DM)

Changes

Fix 1: buildAutoCaptureConversationKeyFromIngress (Path B)

DM: conversationId is undefined → returns channel instead of null. Now matches the key extracted by regex in agent_end.

Fix 2: autoCaptureSeenTextCount cumulative counting (Path A)

Changed from set(sessionKey, eligibleTexts.length) (overwrite) to set(sessionKey, currentCumulativeCount) (accumulate).

Fix 3: Smart extraction threshold uses cumulative count

currentCumulativeCount >= minMessages instead of cleanTexts.length >= minMessages.

Fix 4: isExplicitRememberCommand guard

Preserves explicit remember command behavior in DM context.

Fix 5: extractMinMessages cap

Math.min(config.extractMinMessages ?? 4, 100) — prevents misconfiguration.

Fix 6: MAX_MESSAGE_LENGTH = 5000 guard

Prevents OOM from super-long messages in pendingIngressTexts.

Semantic Change (Needs Discussion)

The fix changes extractMinMessages semantics from "per-event message count" to "cumulative conversation turns". This aligns with the design intent (AliceLJY confirmed this is the correct direction for beta.11), but existing users who configured extractMinMessages based on the old semantics may see different behavior.

Open Questions for Maintainer Review

Q1: extractMinMessages semantic change acceptable?

Before: extractMinMessages=N = each agent_end event needs N messages
After: extractMinMessages=N = smart extraction triggers at turn N

AliceLJY confirmed this is the correct direction. Is this semantic change acceptable as a bug fix, or should it be a separate config option?

Q2: isExplicitRememberCommand behavior after Fix 3

When pendingIngressTexts.length > 0 (multi-turn), texts.length is typically > 1, so texts.length === 1 && isExplicitRememberCommand(...) guard rarely triggers. The fix preserves this by checking lastPending fallback. Is this the intended behavior?

Q3: Window size for cumulative counting

The rolling window for pendingIngressTexts is -6 messages. For DM with extractMinMessages=2, this is sufficient. But if extractMinMessages is set to a very large value (e.g., 50), the window may not hold enough history. Should there be a dynamic adjustment?

…CortexReach#417)

- Fix #1: buildAutoCaptureConversationKeyFromIngress — DM fallback to channelId
  (fixes pendingIngressTexts never being written for Discord DM)
- Fix #2: cumulative counting — autoCaptureSeenTextCount accumulates, not overwrites
  (fixes eligibleTexts.length always 1 for DM, extractMinMessages never satisfied)
- Fix #3: REPLACE vs APPEND — use pendingIngressTexts as-is when present
  (avoids deduplication issues from text appearing in both sources)
- Fix #5: isExplicitRememberCommand guard with lastPending fallback
  (preserves explicit remember command behavior in DM context)
- Fix #6: Math.min cap on extractMinMessages (max 100) — prevents misconfiguration
- Fix #7: MAX_MESSAGE_LENGTH=5000 guard in message_received hook
- Smart extraction threshold now uses currentCumulativeCount (turn count)
  instead of cleanTexts.length (per-event message count)
- Debug logs updated to show cumulative count context

All 29 test suites pass. Based on official latest (5669b08).
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

James added 2 commits April 4, 2026 21:25
…turn counting test + changelog

- Fix #1: buildAutoCaptureConversationKeyFromIngress DM fallback
- Fix #2: currentCumulativeCount (cumulative per-event counting)
- Fix #3: REPLACE vs APPEND + cum count threshold for smart extraction
- Fix #4: remove pendingIngressTexts.delete()
- Fix #5: isExplicitRememberCommand lastPending guard
- Fix #6: Math.min extractMinMessages cap (max 100)
- Fix #7: MAX_MESSAGE_LENGTH=5000 guard
- Add test: 2 sequential agent_end events with extractMinMessages=2
- Add changelog: Unreleased section with issue details
Copy link
Copy Markdown
Collaborator

@rwmjhb rwmjhb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: fix: auto-capture cumulative turn counting for smart extraction (#417)

问题定位准确——单轮 DM 因为 autoCaptureSeenTextCount 被每次覆写而永远触发 regex fallback,导致脏数据。但实现有几个需要关注的点:

Must Fix

  1. 语义混淆: autoCaptureSeenTextCount 现在混合了 transcript length 和 cumulative-turn 两种语义,currentCumulativeCount 在整个 session 生命周期内单调递增无 reset——超过 turn N 之后每轮都会触发 smart extraction。

  2. Build 失败: TypeScript 编译不过,需要先修。

Questions

  • autoCapturePendingIngressTexts.delete(conversationKey) 被移除但 PR description 没说明——是有意为之吗?如果是,什么机制防止 stale pending texts 被重复消费?
  • isExplicitRememberCommand guard 在 multi-turn DM(这个 PR 的目标场景)下不可达——是设计意图还是 bug?
  • 新测试的 createMockApi 传了第 5 个参数但函数签名不接受。

jlin53882 added a commit to jlin53882/memory-lancedb-pro that referenced this pull request Apr 5, 2026
…move dead isExplicitRememberCommand guard (PR CortexReach#518 review fixes)
@jlin53882
Copy link
Copy Markdown
Contributor Author

Phase 2: Design Questions for Maintainer Input

Phase 1 cleanup (PR #534) is ready — fixes createMockApi interface and removes dead code.

Before Phase 2 (semantic fixes), we need maintainer input on these 5 questions:


Q1: Smart Extraction Re-trigger Policy

Issue: currentCumulativeCount is monotonic (no reset). Once it crosses extractMinMessages=N, every subsequent turn triggers smart extraction.

Current behavior (with Phase 1):

  • Turn 1: cumulative=1, skip
  • Turn 2: cumulative=2, trigger ✓
  • Turn 3: cumulative=3, trigger again ⚠️
  • Turn 4: cumulative=4, trigger again ⚠️

Is this the intended behavior? Should there be a session-level once-only flag?


Q2: REPLACE vs APPEND Strategy

PR #518 chose REPLACE (newTexts = pendingIngressTexts) to avoid deduplication issues when the same text appears in both pendingIngressTexts and eligibleTexts.

Alternative: Keep APPEND but deduplicate before joining.

Which strategy is preferred?


Q3: Stale pendingIngressTexts Prevention

Issue: autoCapturePendingIngressTexts.delete(conversationKey) was removed in PR #518, relying on REPLACE + slice(-6) to prevent stale data.

Concern: If agent_end fires without a preceding message_received (e.g., direct trigger, retry/replay), stale texts could be consumed.

Is there a mechanism to ensure message_received always precedes agent_end for a given conversationKey?


Q4: extractMinMessages Semantic Change

PR #518 changes semantics from "per-event message count" to "cumulative conversation turns".

This is a breaking change for users who configured extractMinMessages based on the old semantics.

Is this semantic change acceptable as a bug fix, or should it be a separate config option?


Q5: isExplicitRememberCommand Guard (Dead Code)

Phase 1 removes this guard as unreachable under REPLACE, but I want to confirm this is correct:

Original intent: Handle single "請記住..." messages without enough context, by appending priorRecentTexts.

Why unreachable now: Under REPLACE, texts = pendingIngressTexts (rolling window of up to 6 messages), so texts.length === 1 can never be true in multi-turn scenarios.

Should this behavior be preserved? If so, where should the context-enrichment logic live?


PR #534 (Phase 1): #534
PR #518 (main PR): #518

@jlin53882
Copy link
Copy Markdown
Contributor Author

This PR has been superseded by PR #549, which resolves all blocking concerns from the PR #534 review (Must Fix #1, #2, #3, and Minor item #4) and additionally corrects a placement bug in Fix #1 identified during adversarial review.

Summary of fixes in #549:

Please review #549 instead.

jlin53882 added a commit to jlin53882/memory-lancedb-pro that referenced this pull request Apr 7, 2026
…move dead isExplicitRememberCommand guard (PR CortexReach#518 review fixes)
jlin53882 added a commit to jlin53882/memory-lancedb-pro that referenced this pull request Apr 8, 2026
…move dead isExplicitRememberCommand guard (PR CortexReach#518 review fixes)
@rwmjhb rwmjhb closed this Apr 9, 2026
jlin53882 added a commit to jlin53882/memory-lancedb-pro that referenced this pull request Apr 9, 2026
…move dead isExplicitRememberCommand guard (PR CortexReach#518 review fixes)
jlin53882 added a commit to jlin53882/memory-lancedb-pro that referenced this pull request Apr 16, 2026
jlin53882 added a commit to jlin53882/memory-lancedb-pro that referenced this pull request Apr 20, 2026
…move dead isExplicitRememberCommand guard (PR CortexReach#518 review fixes)
rwmjhb pushed a commit that referenced this pull request May 3, 2026
…rsedes #518, #534)  (#549)

* fix: auto-capture cumulative turn counting for smart extraction (issue #417)

- Fix #1: buildAutoCaptureConversationKeyFromIngress — DM fallback to channelId
  (fixes pendingIngressTexts never being written for Discord DM)
- Fix #2: cumulative counting — autoCaptureSeenTextCount accumulates, not overwrites
  (fixes eligibleTexts.length always 1 for DM, extractMinMessages never satisfied)
- Fix #3: REPLACE vs APPEND — use pendingIngressTexts as-is when present
  (avoids deduplication issues from text appearing in both sources)
- Fix #5: isExplicitRememberCommand guard with lastPending fallback
  (preserves explicit remember command behavior in DM context)
- Fix #6: Math.min cap on extractMinMessages (max 100) — prevents misconfiguration
- Fix #7: MAX_MESSAGE_LENGTH=5000 guard in message_received hook
- Smart extraction threshold now uses currentCumulativeCount (turn count)
  instead of cleanTexts.length (per-event message count)
- Debug logs updated to show cumulative count context

All 29 test suites pass. Based on official latest (5669b08).

* fix: re-apply all 7 fixes for issue #417 + add cumulative turn counting test + changelog

- Fix #1: buildAutoCaptureConversationKeyFromIngress DM fallback
- Fix #2: currentCumulativeCount (cumulative per-event counting)
- Fix #3: REPLACE vs APPEND + cum count threshold for smart extraction
- Fix #4: remove pendingIngressTexts.delete()
- Fix #5: isExplicitRememberCommand lastPending guard
- Fix #6: Math.min extractMinMessages cap (max 100)
- Fix #7: MAX_MESSAGE_LENGTH=5000 guard
- Add test: 2 sequential agent_end events with extractMinMessages=2
- Add changelog: Unreleased section with issue details

* docs: update changelog - add test file reference and improve breaking change label

* fix: Phase 1 - createMockApi accepts pluginConfigOverrides param + remove dead isExplicitRememberCommand guard (PR #518 review fixes)

* fix: resolve all Must Fix items from PR #534 review (issue #417)

* fix: move currentCumulativeCount reset inside success block (Fix #9)

* fix: add try-catch around extractAndPersist to prevent hook crash on extraction failure (Fix #10)

* fix: clear pendingIngressTexts in catch block on extraction failure (Fix #10 extended)

* fix: add conversationKey guard to Fix #8 + restore test comment

* fix: Must Fix 1/2/5 from PR #549 review - counter reset always, newTexts counting, Fix#8 assertion

* fix: Must Fix 1 revised - reset counter to previousSeenCount on all-dedup (reviewer suggestion)

* fix: revert Must Fix #2 (eligibleTexts.length counting restored) - preserves extractMinMessages semantics

* fix: correct test expectation - collected 1 not 2 text(s) after counter formula revert (e5b5e5b)

* fix: replace throw in hook with safe return (Fix-Must5)

* fix: remove unreachable conversationKey guard (Claude Code review)

* fix(issue-417): skip regex fallback when all candidates skipped with no boundary texts (Fix-Must1b)

* test(issue-417): add Fix-Must1b DM fallback regression test

* fix(issue-417): F1 success block counter reset + rate limiter inside success path (rwmjhb review)

* fix(issue-417): document intentional non-reset of counter after regex fallback

* fix(issue-417): MR1 counter虛增 + MR2 cap不合理(Codex對抗式review實作)

- MR1: currentCumulativeCount 改用 newTexts.length 而非
  eligibleTexts.length,防止重複full-history payload導致counter虛增
- MR2: 抽出 AUTO_CAPTURE_PENDING_WINDOW=6 常數,
  讓 queue.slice(-6)、slice(-6)、Math.min(...,100) 三處
  共用同一常數,消除magic number並與threshold cap對齊

* test(issue-417): F5 counter reset success-path regression test

新增 runCounterResetSuccessScenario() 測試 Fix #9(counter 在成功提取後 reset)。

- Turn 1: cumulative=1 < 2, skip
- Turn 2: cumulative=2 >= 2, trigger extraction, LLM returns SUCCESS
  -> Fix #9: counter resets to 0
- Turn 3: cumulative restarts from 0 -> +1 = 1 < 2, skip

關鍵 assertion:
1. LLM 只被 call 一次(turn 2 成功後 turn 3 不再 trigger)
2. Turn 2 成功 log 出現
3. Turn 3 觀察到 cumulative=1 < minMessages=2,正確 skip

* fix(issue-417): 修復維護者review問題 - test mock schema + 移除runtime cap

* fix(issue-417): below-threshold return + CHANGELOG sync (rwmjhb review fix)

* fix(issue-417): remove stale [Fix #6] comment + fix CHANGELOG PR number

* fix(issue-417): Issue2 export fn + Issue3 Fix#5 explicit remember guard + Issue2 unit test

* test(issue-417): add R2 Stage 2 LLM dedup + R3 DM key fallback integration tests

* fix(issue-417): correct misleading comment — counter uses newTexts.length not eligibleTexts.length

Fixes rwmjhb Nice-to-have: comment at line ~2830 stated the counter
uses eligibleTexts.length, but the actual code (since MR1 commit 2ac682d)
uses newTexts.length. Updated comment to accurately describe the
newTexts.length approach and explain why it is correct vs eligibleTexts.length.

* fix(issue-417): MF1 explicit-remember prepend, MF3 counter based on texts.length

* fix(issue-417): MF1 v2 - avoid lastPending duplicate in REPLACE mode

* fix(issue-417): MF3 move let texts before counter; fix MF1 typo

* fix(issue-417): MF1 v3 - includes() check; revert MF3 to newTexts.length

* fix(issue-417-mustfixes): MF2 - move R2 dedup scenario to module scope

* fix(pr549/issue-417): export buildAutoCaptureConversationKeyFromIngress, DM fallback, MAX_MESSAGE_LENGTH guard, cumulative counting, counter reset, try-catch handlers

* fix(test): remove non-existent log assertion in multi-round scenario (Must-Fix 3)

Root cause: line 501 asserted a log message that never existed in production code.
The 'created [preferences]...' format does not exist in smart-extractor.ts or index.ts.
Preserved the actual log assertions: 'merged [preferences]' and 'skipped [preferences]'.
Also preserved entry content validation (entries[0].text).

---------

Co-authored-by: James <james53882@users.noreply.github.com>
Co-authored-by: OpenClaw Agent <agent@openclaw.ai>
Co-authored-by: jlin53882 <jlin53882@users.noreply.github.com>
Co-authored-by: OpenClaw Agent <agent@openclaw.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants