subc-cache doc by luohongyin · Pull Request #32 · subconscious-systems/subconscious-docs

luohongyin · 2026-05-25T22:59:36Z

add page

Copilot

Pull request overview

Adds a new Mintlify feature documentation page describing “Subconscious Cache” and how to disable auto-compaction to trigger cache behavior manually.

Changes:

Introduces a new features/subconscious-cache.mdx page explaining cache matching rules (A/B/C ↔ A/C/D).
Documents how to disable auto-compaction via chat_template_kwargs with Python/Node.js/cURL examples.
Provides guidance on when to use auto-compaction vs manual context control.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+icon: "brain"
+---
+
+Subconscious cache helps the inference system detect context engineering happens in agent reasoning runs by matching prefix and suffix of cached tokens against new inputs. The goal is to preserve the memory of the pruned tokens implicitly wihtin the latent states of suffix tokens, and improve the cache hit rate.


+
+To hit the subconscious cache, the cached tokens and new inputs need to satisfy two criteria:
+1. The cached chain can be precisely split into three sections `A, B, C`
+2. The new input chain can be precisely split into three sections `A, C, D`, where `A` and `C` matches prefix `A` and suffix `C` in the cache, where `len(C) > threshold`. We usually set threshold=8 tokens to avoid matching the suffix of chat templates.


+
+</CodeGroup>
+
+## When to Turn off Auto compaction


+
+Subconscious LLM API enables auto-compaction by default. Under the auto-compaction mode, developers should just send any message list to the LLM API and the inference system will detect prunable messages. Message pruning happened in the auto-mode can automatically hit subconscious cache.
+
+If you want to manully hit subconscious by controlling the context by yourself instead of auto-compaction, simply disable auto compaction in the chat kwards


+
+Subconscious LLM API enables auto-compaction by default. Under the auto-compaction mode, developers should just send any message list to the LLM API and the inference system will detect prunable messages. Message pruning happened in the auto-mode can automatically hit subconscious cache.
+
+If you want to manully hit subconscious by controlling the context by yourself instead of auto-compaction, simply disable auto compaction in the chat kwards


+
+## When to Turn off Auto compaction
+
+if you turn off auto compaction, you need to manually construct inputs that can hit the subconscious cache. Just make sure you only prune **one** continuous token sequence from the message list. If there is no context pruning, the new input will simply hit prefix cache. If more than one chunks are pruned, we cannot find suffix tokens satisfying the subconscious rules.


+
+## When to Turn off Auto compaction
+
+if you turn off auto compaction, you need to manually construct inputs that can hit the subconscious cache. Just make sure you only prune **one** continuous token sequence from the message list. If there is no context pruning, the new input will simply hit prefix cache. If more than one chunks are pruned, we cannot find suffix tokens satisfying the subconscious rules.


+if you turn off auto compaction, you need to manually construct inputs that can hit the subconscious cache. Just make sure you only prune **one** continuous token sequence from the message list. If there is no context pruning, the new input will simply hit prefix cache. If more than one chunks are pruned, we cannot find suffix tokens satisfying the subconscious rules.
+
+**Use Auto Compaction for:**
+- Programming tasks, where assistant-tool-user messages keeps growing in a message list


+
+**Use Auto Compaction for:**
+- Programming tasks, where assistant-tool-user messages keeps growing in a message list
+- Multi-turn conversaion - rigid context pruning rule cannot handle arbitrary user inputs


+---
+title: "Subconscious Cache"
+description: "Disable auto compaction and trigger subconscious cache manually."
+icon: "brain"
+---


subc-cache doc

a649521

luohongyin requested review from Copilot and jfobrien29 May 25, 2026 22:59

Copilot started reviewing on behalf of luohongyin May 25, 2026 22:59 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subc-cache doc#32

subc-cache doc#32
luohongyin wants to merge 1 commit into
mainfrom
subc-cache

luohongyin commented May 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		Subconscious LLM API enables auto-compaction by default. Under the auto-compaction mode, developers should just send any message list to the LLM API and the inference system will detect prunable messages. Message pruning happened in the auto-mode can automatically hit subconscious cache.

		If you want to manully hit subconscious by controlling the context by yourself instead of auto-compaction, simply disable auto compaction in the chat kwards


		## When to Turn off Auto compaction

		if you turn off auto compaction, you need to manually construct inputs that can hit the subconscious cache. Just make sure you only prune one continuous token sequence from the message list. If there is no context pruning, the new input will simply hit prefix cache. If more than one chunks are pruned, we cannot find suffix tokens satisfying the subconscious rules.

Conversation

luohongyin commented May 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants