Skip to content

subc-cache doc#32

Open
luohongyin wants to merge 1 commit into
mainfrom
subc-cache
Open

subc-cache doc#32
luohongyin wants to merge 1 commit into
mainfrom
subc-cache

Conversation

@luohongyin
Copy link
Copy Markdown
Contributor

add page

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Mintlify feature documentation page describing “Subconscious Cache” and how to disable auto-compaction to trigger cache behavior manually.

Changes:

  • Introduces a new features/subconscious-cache.mdx page explaining cache matching rules (A/B/C ↔ A/C/D).
  • Documents how to disable auto-compaction via chat_template_kwargs with Python/Node.js/cURL examples.
  • Provides guidance on when to use auto-compaction vs manual context control.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

icon: "brain"
---

Subconscious cache helps the inference system detect context engineering happens in agent reasoning runs by matching prefix and suffix of cached tokens against new inputs. The goal is to preserve the memory of the pruned tokens implicitly wihtin the latent states of suffix tokens, and improve the cache hit rate.

To hit the subconscious cache, the cached tokens and new inputs need to satisfy two criteria:
1. The cached chain can be precisely split into three sections `A, B, C`
2. The new input chain can be precisely split into three sections `A, C, D`, where `A` and `C` matches prefix `A` and suffix `C` in the cache, where `len(C) > threshold`. We usually set threshold=8 tokens to avoid matching the suffix of chat templates.

</CodeGroup>

## When to Turn off Auto compaction

Subconscious LLM API enables auto-compaction by default. Under the auto-compaction mode, developers should just send any message list to the LLM API and the inference system will detect prunable messages. Message pruning happened in the auto-mode can automatically hit subconscious cache.

If you want to manully hit subconscious by controlling the context by yourself instead of auto-compaction, simply disable auto compaction in the chat kwards

Subconscious LLM API enables auto-compaction by default. Under the auto-compaction mode, developers should just send any message list to the LLM API and the inference system will detect prunable messages. Message pruning happened in the auto-mode can automatically hit subconscious cache.

If you want to manully hit subconscious by controlling the context by yourself instead of auto-compaction, simply disable auto compaction in the chat kwards

## When to Turn off Auto compaction

if you turn off auto compaction, you need to manually construct inputs that can hit the subconscious cache. Just make sure you only prune **one** continuous token sequence from the message list. If there is no context pruning, the new input will simply hit prefix cache. If more than one chunks are pruned, we cannot find suffix tokens satisfying the subconscious rules.

## When to Turn off Auto compaction

if you turn off auto compaction, you need to manually construct inputs that can hit the subconscious cache. Just make sure you only prune **one** continuous token sequence from the message list. If there is no context pruning, the new input will simply hit prefix cache. If more than one chunks are pruned, we cannot find suffix tokens satisfying the subconscious rules.
if you turn off auto compaction, you need to manually construct inputs that can hit the subconscious cache. Just make sure you only prune **one** continuous token sequence from the message list. If there is no context pruning, the new input will simply hit prefix cache. If more than one chunks are pruned, we cannot find suffix tokens satisfying the subconscious rules.

**Use Auto Compaction for:**
- Programming tasks, where assistant-tool-user messages keeps growing in a message list

**Use Auto Compaction for:**
- Programming tasks, where assistant-tool-user messages keeps growing in a message list
- Multi-turn conversaion - rigid context pruning rule cannot handle arbitrary user inputs
Comment on lines +1 to +5
---
title: "Subconscious Cache"
description: "Disable auto compaction and trigger subconscious cache manually."
icon: "brain"
---
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants