prependContext + showInjected 导致 OpenAI-compatible provider 前缀缓存命中率退化 / Prompt cache hit rate regression

## 问题描述 / Problem

启用 memory-tencentdb 插件后，OpenAI-compatible 提供商（DeepSeek、MiMo）的 prompt 缓存命中率出现显著退化。

### 环境 / Environment
- OpenClaw 2026.5.28（5 月 30 日从 2026.5.19 升级）
- 提供商：DeepSeek V4 Pro、MiMo V2.5 Pro（均为 `openai-completions` API，依赖 prefix-matching 缓存）
- memory-tencentdb 插件于 5 月 30 日上线

### 现象 / Symptoms

| 日期 | OpenClaw | TencentDB | MiMo 命中率 | DeepSeek 命中率 |
|------|----------|-----------|------------|----------------|
| 5/29 | 5.19 | ❌ 未上线 | 91.1% | 95.7% |
| 5/31 | 5.28 | ✅ 全量 | 63.5% | 83.3% |


### 根因分析 / Root Cause

**主因：prependContext → 上下文膨胀 → 前缀缓存失效**

1. TencentDB 每轮向用户消息开头注入 `prependContext`（召回的记忆，约 500-1700 tokens）。当 `showInjected=true` 时，这些内容被冻结写入对话历史中。
2. 多轮对话后，上下文快速膨胀。膨胀触发更频繁的 tool result truncation。
3. truncation 的截断量每轮不同（基于 token budget 动态计算），导致对话历史前缀不一致 → prefix-matching 缓存失效。

**次要：appendSystemContext 放置位置不当**

`composeSystemPromptWithHookContext` 将 persona + 场景导航（~4000 字符）直接拼接到系统提示的 CACHE_BOUNDARY 之后，未调用已有的 `prependSystemPromptAdditionAfterCacheBoundary`。稳定内容每轮被当做新 token 计费。

### 建议 / Suggestions

1. 稳定 persona 内容应放在 CACHE_BOUNDARY 之前参与缓存
2. 评估 `showInjected` 对对话历史膨胀的长期影响
3. 考虑 session 级稳定系统提示追加内容的去重

---

## Problem

Prompt cache hit rates for OpenAI-compatible providers (DeepSeek, MiMo) degraded significantly after enabling the memory-tencentdb plugin, combined with the OpenClaw 5.19 → 5.28 upgrade.

### Environment
- OpenClaw 2026.5.28 (upgraded from 2026.5.19 on May 30)
- Providers: DeepSeek V4 Pro, MiMo V2.5 Pro (both `openai-completions` API, prefix-matching cache)
- memory-tencentdb plugin deployed on May 30

### Cache Hit Rate Comparison

| Date | OpenClaw | TencentDB | MiMo Hit Rate | DeepSeek Hit Rate |
|------|----------|-----------|---------------|-------------------|
| May 29 | 5.19 | ❌ Off | 91.1% | 95.7% |
| May 31 | 5.28 | ✅ On | 63.5% | 83.3% |



### Root Cause

**Primary: prependContext → context bloat → prefix cache invalidation**

1. TencentDB prepends `prependContext` (recalled memories, ~500-1700 tokens) to each user message. With `showInjected=true`, this content is frozen into conversation history.
2. Context bloat triggers more frequent tool result truncation over multiple turns.
3. Variable truncation amounts per turn (dynamic token budget) → conversation history prefix inconsistent → prefix-matching cache invalidated.

**Secondary: appendSystemContext placed after CACHE_BOUNDARY**

`composeSystemPromptWithHookContext` appends persona + scene navigation (~4000 chars) after the CACHE_BOUNDARY marker without using the existing `prependSystemPromptAdditionAfterCacheBoundary`. Stable content re-sent as fresh tokens every turn.

### Suggestions

1. Place stable persona content before CACHE_BOUNDARY for caching
2. Evaluate long-term impact of `showInjected` on conversation history growth  
3. Consider session-level dedup of stable system prompt additions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prependContext + showInjected 导致 OpenAI-compatible provider 前缀缓存命中率退化 / Prompt cache hit rate regression #120

问题描述 / Problem

环境 / Environment

现象 / Symptoms

根因分析 / Root Cause

建议 / Suggestions

Problem

Environment

Cache Hit Rate Comparison

Root Cause

Suggestions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

prependContext + showInjected 导致 OpenAI-compatible provider 前缀缓存命中率退化 / Prompt cache hit rate regression #120

Description

问题描述 / Problem

环境 / Environment

现象 / Symptoms

根因分析 / Root Cause

建议 / Suggestions

Problem

Environment

Cache Hit Rate Comparison

Root Cause

Suggestions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions