Skip to content

Commit 9aa6464

Browse files
authored
Merge pull request #25 from sharpninja/copilot/add-bucketing-implementation-plan
feat: Chain-Bucket Speculative Decoding + Training-Time Sequence Compression (bucketing)
2 parents 1d326d0 + 7746f56 commit 9aa6464

File tree

13 files changed

+1130
-14
lines changed

13 files changed

+1130
-14
lines changed

docs/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ BitNet b1.58 Sharp is a .NET 10 C# reference implementation of the paper-aligned
99
- Microsoft Agent Framework-oriented hosting in `/src/BitNetSharp.App`
1010
- BenchmarkDotNet-based local model comparison in `/src/BitNetSharp.App`
1111
- DataGen synthetic dataset generation from JSON seed examples
12+
- Chain-Bucket Speculative Decoding and Training-Time Sequence Compression via the bucketing subsystem
1213
- Default American English interaction behavior
1314
- Seeded transformer inspection and ternary weight summaries
1415
- GitBook-formatted project documentation in `/docs`
@@ -27,6 +28,8 @@ dotnet test BitNet-b1.58-Sharp.slnx
2728

2829
- [Architecture](architecture.md)
2930
- [Benchmarking and model comparison](benchmarking.md)
31+
- [Bucketing guide](bucketing-guide.md)
32+
- [Bucketing implementation plan v1.0](bucketing-implementation-plan-v1.0.md)
3033
- [DataGen guide](datagen-guide.md)
3134
- [Implementation plan](implementation-plan-v3.md)
3235
- [Releases and packaging](releases-and-packaging.md)

docs/SUMMARY.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
- [BitNet b1.58 Sharp](README.md)
44
- [Architecture](architecture.md)
5+
- [Bucketing guide](bucketing-guide.md)
6+
- [Bucketing implementation plan v1.0](bucketing-implementation-plan-v1.0.md)
57
- [DataGen guide](datagen-guide.md)
68
- [Implementation plan v3 (active)](implementation-plan-v3.md)
79
- [Implementation plan v2 (archived)](implementation-plan-v2.md)

docs/bucketing-guide.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# Bucketing Guide
2+
3+
Bucketing is a core optimization in BitNet b1.58 Sharp that accelerates inference via **Chain-Bucket Speculative Decoding** and reduces training cost via **Training-Time Sequence Compression**.
4+
5+
---
6+
7+
## How It Works
8+
9+
### Chain-Bucket Speculative Decoding (Inference)
10+
11+
A `ChainBucketTable` stores up to 256 frequent n-gram chains (length 2–8) mined from a training corpus. During generation:
12+
13+
1. After each normally generated token, the last 1–3 context tokens are looked up in the table.
14+
2. If a matching chain is found, the model speculatively emits the chain's continuation tokens.
15+
3. Each speculative token is verified: if the model's top-1 prediction matches, the token is accepted.
16+
4. Accepted tokens are appended to the context at once, reducing the number of full forward passes.
17+
18+
This is safe: no token is accepted without model verification.
19+
20+
### Training-Time Sequence Compression
21+
22+
When compression is enabled, the prompt context passed to the forward pass is shortened by replacing known chain n-grams with the first token of each chain. The loss target is unchanged. This reduces the effective context length and speeds up each training step.
23+
24+
---
25+
26+
## Quick Start
27+
28+
### Via CLI (automatic corpus mining)
29+
30+
```bash
31+
# Chat with chain-bucket speculative decoding active
32+
dotnet run --project src/BitNetSharp.App -- chat "hello" --enable-bucketing
33+
34+
# Train with sequence compression active
35+
dotnet run --project src/BitNetSharp.App -- train --enable-bucketing
36+
```
37+
38+
The `--enable-bucketing` flag mines a `ChainBucketTable` from the default training corpus at startup and activates both `EnableChainBuckets` and `EnableSequenceCompression`.
39+
40+
### Via code (programmatic setup)
41+
42+
```csharp
43+
// Create a model with bucketing options enabled
44+
var model = BitNetBootstrap.CreatePaperModel(
45+
verbosity: VerbosityLevel.Normal,
46+
enableChainBuckets: true,
47+
enableSequenceCompression: true);
48+
49+
// Mine buckets from your own training examples
50+
var examples = MyCorpus.LoadExamples();
51+
var table = model.MineAndLoadBuckets(examples);
52+
Console.WriteLine($"Mined {table.Count} chain buckets.");
53+
54+
// Generate with speculative decoding active
55+
var result = model.GenerateResponse("What is BitNet?");
56+
```
57+
58+
### Via `BucketMiner` directly (advanced)
59+
60+
```csharp
61+
using BitNetSharp.Core.Bucketing;
62+
63+
// Provide tokenized integer sequences
64+
IReadOnlyList<int>[] sequences = GetTokenizedCorpus();
65+
var table = BucketMiner.Mine(sequences, maxBuckets: 256);
66+
67+
model.LoadBucketTable(table);
68+
```
69+
70+
---
71+
72+
## Configuration Options
73+
74+
The following properties are added to `BitNetOptions`:
75+
76+
| Property | Default | Description |
77+
|----------|---------|-------------|
78+
| `EnableChainBuckets` | `false` | Activates chain-bucket speculative decoding during inference. |
79+
| `EnableSequenceCompression` | `false` | Activates training-time prompt compression using chain buckets. |
80+
81+
---
82+
83+
## Expected Performance
84+
85+
| Metric | Without Bucketing | With Bucketing |
86+
|--------|-------------------|----------------|
87+
| Tokens/sec (inference) | baseline | ≥ 1.8× (≥ 70 % acceptance rate) |
88+
| Effective sequence length (training) | baseline | 20–35 % shorter |
89+
| Training time per epoch | baseline | 20–35 % faster |
90+
| Output quality | baseline | no regression (verified) |
91+
92+
Actual gains depend on corpus repetition patterns and chain acceptance rates.
93+
94+
---
95+
96+
## Architecture
97+
98+
See the full design in [Bucketing Implementation Plan v1.0](bucketing-implementation-plan-v1.0.md).
99+
100+
Key source files:
101+
102+
| File | Description |
103+
|------|-------------|
104+
| `src/BitNetSharp.Core/Bucketing/ChainBucket.cs` | Record for a single n-gram chain bucket. |
105+
| `src/BitNetSharp.Core/Bucketing/ChainBucketTable.cs` | 256-entry lookup table with prefix matching. |
106+
| `src/BitNetSharp.Core/Bucketing/BucketMiner.cs` | N-gram mining and scoring service. |
107+
| `src/BitNetSharp.Core/BitNetOptions.cs` | `EnableChainBuckets`, `EnableSequenceCompression`. |
108+
| `src/BitNetSharp.Core/BitNetPaperModel.cs` | Integrated speculative decoding and compression. |
109+
| `src/BitNetSharp.App/Program.cs` | `--enable-bucketing` CLI flag. |
Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
# BitNet-b1.58-Sharp: Bucketing Implementation Plan v1.0
2+
**Chain-Bucket Speculative Decoding + Training-Time Sequence Compression**
3+
**Core Feature for Inference Speedup and Training Efficiency**
4+
5+
**Version:** 1.0
6+
**Date:** March 20, 2026
7+
**Status:** Production-ready blueprint
8+
9+
---
10+
11+
## Table of Contents
12+
1. Executive Summary & Success Criteria
13+
2. Prerequisites & Integration Points
14+
3. Overall Architecture
15+
4. Phase 1: Offline Bucket Mining Pipeline (5–7 days)
16+
5. Phase 2: Inference-Time Chain-Bucket Speculative Decoding (7–10 days)
17+
6. Phase 3: Training-Time Sequence Compression with Super-Tokens (8–12 days)
18+
7. Phase 4: Quality Safeguards, Evaluation & Benchmarks (5–7 days)
19+
8. Phase 5: CLI, Documentation & Release (3–5 days)
20+
9. Full UML Catalog (Object & Logic Examples)
21+
10. Risk Register & Mitigation
22+
11. Timeline, Milestones & Effort Estimates
23+
12. Future Extensions
24+
25+
---
26+
27+
## 1. Executive Summary & Success Criteria
28+
Goal: Add **bucketing** as a core optimization that accelerates both inference (via speculative multi-token jumps) and training (via compressed token sequences using super-tokens).
29+
30+
**Success Criteria**
31+
- Inference: ≥ 1.8× tokens/sec uplift with ≥ 70 % chain acceptance rate
32+
- Training: ≥ 25 % reduction in effective sequence length and training time
33+
- Zero quality regression (verified by perplexity and downstream metrics)
34+
- Fully optional via `BitNetOptions` (enabled by default for new models)
35+
- Works with any tokenizer and any BitNet checkpoint
36+
37+
---
38+
39+
## 2. Prerequisites & Integration Points
40+
- Existing `BitNetTransformer`, `BitNetPaperModel`, and training loop
41+
- `BitNetOptions` class (for toggles)
42+
- Existing tokenizer and training corpus
43+
- Benchmark suite (TinyLlama-1.1B + perplexity)
44+
45+
---
46+
47+
## 3. Overall Architecture
48+
49+
```mermaid
50+
graph TD
51+
BitNetPaperModel --> ChainBucketTable
52+
BucketMiner --> ChainBucketTable
53+
ChainBucketTable --> InferencePath[Inference: Speculative Decoding]
54+
ChainBucketTable --> TrainingPath[Training: Sequence Compression]
55+
```
56+
57+
---
58+
59+
## 4. Phase 1: Offline Bucket Mining Pipeline (5–7 days)
60+
1. Create `BucketMiner` service that scans tokenized corpora.
61+
2. Extract frequent n-grams (n=2 to n=8).
62+
3. Score candidates by frequency × conditional probability.
63+
4. Pack top candidates into exactly 256 buckets (one byte).
64+
5. Store: `byte ChainID → TokenID[] chain + float confidence`.
65+
6. Output: `ChainBucketTable` (versioned, < 50 KB).
66+
67+
**Implementation:** `src/BitNetSharp.Core/Bucketing/BucketMiner.cs`
68+
69+
---
70+
71+
## 5. Phase 2: Inference-Time Chain-Bucket Speculative Decoding (7–10 days)
72+
**Core flow:**
73+
1. After each token, check last 1–3 tokens against bucket prefixes.
74+
2. If match found, speculatively emit continuation tokens from the matching chain.
75+
3. Run parallel verification pass: confirm model top-1 prediction matches each chain token.
76+
4. Accept tokens sequentially until first mismatch (classic speculative safety).
77+
5. Context window updated once for the entire accepted chain.
78+
79+
**Integration:**
80+
- Extend `BitNetPaperModel.GenerateResponse()` with optional bucketing path.
81+
- Add `ChainBucketTable` loaded via `MineAndLoadBuckets()` or `LoadBucketTable()`.
82+
- Configurable via `BitNetOptions.EnableChainBuckets` and `MaxChainLength`.
83+
84+
**Implementation:** `src/BitNetSharp.Core/BitNetPaperModel.cs`
85+
86+
---
87+
88+
## 6. Phase 3: Training-Time Sequence Compression with Super-Tokens (8–12 days)
89+
**New capability:** During training, replace frequent n-grams with a single first-token placeholder to shorten sequences.
90+
91+
**Steps:**
92+
1. Before each training batch forward pass, scan the prompt sequence for chains.
93+
2. Replace matching n-grams with just the first token of the chain.
94+
3. During forward pass, the model sees compressed sequences (shorter context = faster training).
95+
4. Loss is still computed against the original first target token.
96+
5. Periodic re-mining at startup or on demand adapts to corpus content.
97+
98+
**BitNet specifics:**
99+
- Compression is applied to the INPUT context only; target tokens are unchanged.
100+
- Re-quantization schedule unchanged.
101+
- Expected benefit: 20–35 % reduction in training tokens processed per epoch.
102+
103+
**Configuration:** `BitNetOptions.EnableSequenceCompression = true`
104+
105+
**Implementation:** `src/BitNetSharp.Core/BitNetPaperModel.cs` (`CompressSequence` helper)
106+
107+
---
108+
109+
## 7. Phase 4: Quality Safeguards, Evaluation & Benchmarks (5–7 days)
110+
1. Add verification step: every generated chain must match model top-1 probabilities.
111+
2. Perplexity check on compressed vs uncompressed validation set.
112+
3. Benchmark suite extension:
113+
- Tokens/sec with/without bucketing
114+
- Training time per epoch with/without sequence compression
115+
- Acceptance rate and compression ratio metrics
116+
4. Add to existing TinyLlama-1.1B benchmark pipeline.
117+
118+
---
119+
120+
## 8. Phase 5: CLI, Documentation & Release (3–5 days)
121+
1. CLI commands:
122+
- `dotnet run -- chat "hello" --enable-bucketing`
123+
- `dotnet run -- train --enable-bucketing`
124+
- `dotnet run -- datagen --domain code --count 10 --output data.jsonl`
125+
2. Update `/docs/bucketing-guide.md` with usage, expected speedups, and quality notes.
126+
3. Add to main README as core optimization feature.
127+
4. Release with pre-mined bucket tables for common tokenizers.
128+
129+
**Implementation:** `src/BitNetSharp.App/Program.cs`
130+
131+
---
132+
133+
## 9. Full UML Catalog (Object & Logic Examples)
134+
135+
**Inference-Time Flow**
136+
137+
```mermaid
138+
flowchart TD
139+
A[Last 1-3 Tokens] --> B[Bucket Table Lookup]
140+
B --> C[Chain Candidate Found?]
141+
C -->|Yes| D[Expand + Verify Each Token]
142+
D --> E[Accept Until Mismatch]
143+
E --> F[Context Updated for Full Accepted Chain]
144+
C -->|No| G[Normal Single-Token Generation]
145+
```
146+
147+
**Training-Time Compression Flow**
148+
149+
```mermaid
150+
flowchart TD
151+
A[Raw Token Sequence] --> B[CompressSequence]
152+
B --> C[Replace n-grams with Chain First Token]
153+
C --> D[Compressed Sequence → BitNet Forward]
154+
D --> E[Loss Computed on Original Target Token]
155+
E --> F[Backprop on Compressed Sequence]
156+
```
157+
158+
**Class Structure**
159+
160+
```mermaid
161+
classDiagram
162+
class ChainBucket {
163+
+byte ChainId
164+
+int[] TokenIds
165+
+float Confidence
166+
+int Length
167+
}
168+
class ChainBucketTable {
169+
+int Count
170+
+IReadOnlyList~ChainBucket~ Buckets
171+
+TryLookupPrefix(contextTail, out chain) bool
172+
+GetById(chainId) ChainBucket?
173+
}
174+
class BucketMiner {
175+
+Mine(sequences, maxBuckets) ChainBucketTable$
176+
}
177+
class BitNetPaperModel {
178+
+ChainBucketTable? BucketTable
179+
+BitNetOptions Options
180+
+LoadBucketTable(table)
181+
+MineAndLoadBuckets(examples) ChainBucketTable
182+
+GenerateResponse(prompt, maxTokens) BitNetGenerationResult
183+
+Train(examples, epochs) TrainingReport
184+
}
185+
BitNetPaperModel --> ChainBucketTable
186+
BucketMiner --> ChainBucketTable
187+
ChainBucketTable "1" *-- "0..256" ChainBucket
188+
```
189+
190+
---
191+
192+
## 10. Risk Register & Mitigation
193+
| Risk | Likelihood | Impact | Mitigation |
194+
|------|------------|--------|------------|
195+
| Quality regression from compression | Medium | High | Strong verification + perplexity guardrails |
196+
| Bucket table staleness | Low | Medium | Periodic re-mining during training |
197+
| Increased memory for table | Low | Low | 256 buckets only (~few KB) |
198+
199+
---
200+
201+
## 11. Timeline, Milestones & Effort Estimates (Solo Developer)
202+
- Phase 1: 5–7 days → "Bucket Mining Ready"
203+
- Phase 2: 7–10 days → "Inference Bucketing Live"
204+
- Phase 3: 8–12 days → "Training Compression Live"
205+
- Phase 4–5: 8–12 days → "Full Release"
206+
207+
**Total estimated effort:** 35–50 days (highly parallelizable with existing training loop).
208+
209+
---
210+
211+
## 12. Future Extensions
212+
- Dynamic bucket updating during training
213+
- Multi-byte chain IDs for >256 buckets
214+
- Integration with DataGen SLM for bucket-aware synthetic data
215+
216+
**End of Document**

src/BitNetSharp.App/HostedAgentModelFactory.cs

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,9 @@ public static class HostedAgentModelFactory
1010
public static IHostedAgentModel Create(
1111
string? specifier,
1212
VerbosityLevel verbosity = VerbosityLevel.Normal,
13-
IEnumerable<TrainingExample>? trainingExamples = null)
13+
IEnumerable<TrainingExample>? trainingExamples = null,
14+
bool enableChainBuckets = false,
15+
bool enableSequenceCompression = false)
1416
{
1517
var value = string.IsNullOrWhiteSpace(specifier)
1618
? DefaultModelId
@@ -25,8 +27,8 @@ public static IHostedAgentModel Create(
2527
{
2628
DefaultModelId => new BitNetHostedAgentModel(
2729
trainingExamples is null
28-
? BitNetBootstrap.CreatePaperModel(verbosity)
29-
: BitNetBootstrap.CreatePaperModel(trainingExamples, verbosity)),
30+
? BitNetBootstrap.CreatePaperModel(verbosity, enableChainBuckets, enableSequenceCompression)
31+
: BitNetBootstrap.CreatePaperModel(trainingExamples, verbosity, enableChainBuckets, enableSequenceCompression)),
3032
TraditionalLocalModelId => new TraditionalLocalHostedAgentModel(verbosity, trainingExamples),
3133
_ => throw new ArgumentException(
3234
$"Unknown model specifier '{value}'. Use '{DefaultModelId}', '{TraditionalLocalModelId}', or an absolute path to a local command model JSON file.",

0 commit comments

Comments
 (0)