subc-cache doc#32
Open
luohongyin wants to merge 1 commit into
Open
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new Mintlify feature documentation page describing “Subconscious Cache” and how to disable auto-compaction to trigger cache behavior manually.
Changes:
- Introduces a new
features/subconscious-cache.mdxpage explaining cache matching rules (A/B/C ↔ A/C/D). - Documents how to disable auto-compaction via
chat_template_kwargswith Python/Node.js/cURL examples. - Provides guidance on when to use auto-compaction vs manual context control.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| icon: "brain" | ||
| --- | ||
|
|
||
| Subconscious cache helps the inference system detect context engineering happens in agent reasoning runs by matching prefix and suffix of cached tokens against new inputs. The goal is to preserve the memory of the pruned tokens implicitly wihtin the latent states of suffix tokens, and improve the cache hit rate. |
|
|
||
| To hit the subconscious cache, the cached tokens and new inputs need to satisfy two criteria: | ||
| 1. The cached chain can be precisely split into three sections `A, B, C` | ||
| 2. The new input chain can be precisely split into three sections `A, C, D`, where `A` and `C` matches prefix `A` and suffix `C` in the cache, where `len(C) > threshold`. We usually set threshold=8 tokens to avoid matching the suffix of chat templates. |
|
|
||
| </CodeGroup> | ||
|
|
||
| ## When to Turn off Auto compaction |
|
|
||
| Subconscious LLM API enables auto-compaction by default. Under the auto-compaction mode, developers should just send any message list to the LLM API and the inference system will detect prunable messages. Message pruning happened in the auto-mode can automatically hit subconscious cache. | ||
|
|
||
| If you want to manully hit subconscious by controlling the context by yourself instead of auto-compaction, simply disable auto compaction in the chat kwards |
|
|
||
| Subconscious LLM API enables auto-compaction by default. Under the auto-compaction mode, developers should just send any message list to the LLM API and the inference system will detect prunable messages. Message pruning happened in the auto-mode can automatically hit subconscious cache. | ||
|
|
||
| If you want to manully hit subconscious by controlling the context by yourself instead of auto-compaction, simply disable auto compaction in the chat kwards |
|
|
||
| ## When to Turn off Auto compaction | ||
|
|
||
| if you turn off auto compaction, you need to manually construct inputs that can hit the subconscious cache. Just make sure you only prune **one** continuous token sequence from the message list. If there is no context pruning, the new input will simply hit prefix cache. If more than one chunks are pruned, we cannot find suffix tokens satisfying the subconscious rules. |
|
|
||
| ## When to Turn off Auto compaction | ||
|
|
||
| if you turn off auto compaction, you need to manually construct inputs that can hit the subconscious cache. Just make sure you only prune **one** continuous token sequence from the message list. If there is no context pruning, the new input will simply hit prefix cache. If more than one chunks are pruned, we cannot find suffix tokens satisfying the subconscious rules. |
| if you turn off auto compaction, you need to manually construct inputs that can hit the subconscious cache. Just make sure you only prune **one** continuous token sequence from the message list. If there is no context pruning, the new input will simply hit prefix cache. If more than one chunks are pruned, we cannot find suffix tokens satisfying the subconscious rules. | ||
|
|
||
| **Use Auto Compaction for:** | ||
| - Programming tasks, where assistant-tool-user messages keeps growing in a message list |
|
|
||
| **Use Auto Compaction for:** | ||
| - Programming tasks, where assistant-tool-user messages keeps growing in a message list | ||
| - Multi-turn conversaion - rigid context pruning rule cannot handle arbitrary user inputs |
Comment on lines
+1
to
+5
| --- | ||
| title: "Subconscious Cache" | ||
| description: "Disable auto compaction and trigger subconscious cache manually." | ||
| icon: "brain" | ||
| --- |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
add page