Here is the mathematical foundation behind the DeepSeek-Codec-Plugin, covering the core principles of prompt compression and the specific algorithms used in each backend.
The fundamental principle is Shannon's source coding theorem: the minimum number of bits required to encode a message is its entropy.
For a sequence of tokens
Natural language has low entropy due to redundancy. A typical LLM token has an entropy of ~4-6 bits, but is stored using 16-32 bits. Prompt compression exploits this redundancy.
Compression Ratio:
Token Efficiency:
LLMLingua uses a small language model to estimate the importance of each token.
Step 1: Perplexity Calculation
For a token
The perplexity contribution of token
Step 2: Token Importance Scoring
Tokens with high perplexity are less predictable and thus more informative. LLMLingua defines importance as:
where
Step 3: Iterative Token Pruning
Given a target compression rate
Step 4: Context-Aware Preservation
LLMLingua also preserves tokens that are structurally important (e.g., punctuation, line breaks) via a force token set
LLMLingua2 improves upon the original by using a contrastive approach. It compares the probability under a small model vs. a larger reference:
Tokens where the small model is more surprised than the large model are informative and should be kept:
This backend uses deterministic rules based on linguistic redundancy.
Abbreviation Mapping:
$$f_{\text{abbr}}(w) = \begin{cases} \text{abbr}(w) & \text{if } w \in \mathcal{A} \ w & \text{otherwise} \end{cases}$$
where
Filler Word Removal: $$X' = {x_i \mid x_i \notin \mathcal{W}{\text{filler}}}$$ where $\mathcal{W}{\text{filler}}$ is a set of low-information words.
Whitespace Normalization:
This method converts text to an image and uses a Vision-Language Model to extract compressed meaning.
Text-to-Image Encoding:
Visual Tokenization:
A VLM encodes the image into a sequence of visual tokens
Information Density:
Minify JSON by removing non-semantic whitespace:
Remove redundant blank lines and normalize headings:
Strip comments (simplified):
The system prompt
where
The target compression rate
where
| Component | Mathematical Core | Key Formula |
|---|---|---|
| LLMLingua | Perplexity-based importance | |
| LLMLingua2 | Contrastive perplexity | $\Delta\text{PPL}i = \log \frac{P{\text{large}}}{P_{\text{small}}}$ |
| Heuristic | Rule-based filtering | |
| OCR | Visual token compression | |
| Format Optimizers | Structural minification | JSON, Markdown, Code normalizers |
| System Prompt | Protected concatenation |
These mathematical foundations enable the DeepSeek-Codec-Plugin to achieve 2-20x token reduction while preserving semantic fidelity.