Skip to content

prependContext + showInjected 导致 OpenAI-compatible provider 前缀缓存命中率退化 / Prompt cache hit rate regression #120

@Enominera

Description

@Enominera

问题描述 / Problem

启用 memory-tencentdb 插件后,OpenAI-compatible 提供商(DeepSeek、MiMo)的 prompt 缓存命中率出现显著退化。

环境 / Environment

  • OpenClaw 2026.5.28(5 月 30 日从 2026.5.19 升级)
  • 提供商:DeepSeek V4 Pro、MiMo V2.5 Pro(均为 openai-completions API,依赖 prefix-matching 缓存)
  • memory-tencentdb 插件于 5 月 30 日上线

现象 / Symptoms

日期 OpenClaw TencentDB MiMo 命中率 DeepSeek 命中率
5/29 5.19 ❌ 未上线 91.1% 95.7%
5/31 5.28 ✅ 全量 63.5% 83.3%

根因分析 / Root Cause

主因:prependContext → 上下文膨胀 → 前缀缓存失效

  1. TencentDB 每轮向用户消息开头注入 prependContext(召回的记忆,约 500-1700 tokens)。当 showInjected=true 时,这些内容被冻结写入对话历史中。
  2. 多轮对话后,上下文快速膨胀。膨胀触发更频繁的 tool result truncation。
  3. truncation 的截断量每轮不同(基于 token budget 动态计算),导致对话历史前缀不一致 → prefix-matching 缓存失效。

次要:appendSystemContext 放置位置不当

composeSystemPromptWithHookContext 将 persona + 场景导航(~4000 字符)直接拼接到系统提示的 CACHE_BOUNDARY 之后,未调用已有的 prependSystemPromptAdditionAfterCacheBoundary。稳定内容每轮被当做新 token 计费。

建议 / Suggestions

  1. 稳定 persona 内容应放在 CACHE_BOUNDARY 之前参与缓存
  2. 评估 showInjected 对对话历史膨胀的长期影响
  3. 考虑 session 级稳定系统提示追加内容的去重

Problem

Prompt cache hit rates for OpenAI-compatible providers (DeepSeek, MiMo) degraded significantly after enabling the memory-tencentdb plugin, combined with the OpenClaw 5.19 → 5.28 upgrade.

Environment

  • OpenClaw 2026.5.28 (upgraded from 2026.5.19 on May 30)
  • Providers: DeepSeek V4 Pro, MiMo V2.5 Pro (both openai-completions API, prefix-matching cache)
  • memory-tencentdb plugin deployed on May 30

Cache Hit Rate Comparison

Date OpenClaw TencentDB MiMo Hit Rate DeepSeek Hit Rate
May 29 5.19 ❌ Off 91.1% 95.7%
May 31 5.28 ✅ On 63.5% 83.3%

Root Cause

Primary: prependContext → context bloat → prefix cache invalidation

  1. TencentDB prepends prependContext (recalled memories, ~500-1700 tokens) to each user message. With showInjected=true, this content is frozen into conversation history.
  2. Context bloat triggers more frequent tool result truncation over multiple turns.
  3. Variable truncation amounts per turn (dynamic token budget) → conversation history prefix inconsistent → prefix-matching cache invalidated.

Secondary: appendSystemContext placed after CACHE_BOUNDARY

composeSystemPromptWithHookContext appends persona + scene navigation (~4000 chars) after the CACHE_BOUNDARY marker without using the existing prependSystemPromptAdditionAfterCacheBoundary. Stable content re-sent as fresh tokens every turn.

Suggestions

  1. Place stable persona content before CACHE_BOUNDARY for caching
  2. Evaluate long-term impact of showInjected on conversation history growth
  3. Consider session-level dedup of stable system prompt additions

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions