sharpninja · sharpninja · Mar 20, 2026 · Mar 20, 2026 · Mar 20, 2026 · Mar 20, 2026
diff --git a/docs/benchmarking.md b/docs/benchmarking.md
@@ -20,6 +20,7 @@ The manual GitHub Actions benchmark report workflow runs the same benchmark suit
 - efficacy, measured as non-empty responses across the shared default query script
 - accuracy, measured as exact-match and expected-token recall against the default corpus responses
 - performance, measured from the exported BenchmarkDotNet results
+- a paper-alignment audit for the canonical BitNet model so the report shows both implemented architecture guarantees and still-pending paper reproduction work
 
 ## Run the built-in comparison benchmark
 
@@ -40,6 +41,7 @@ This command writes a static report site with:
 - `index.html` for GitHub Pages publishing
 - `comparison-report.md` and `comparison-report.json` summaries
 - raw BenchmarkDotNet HTML, CSV, and GitHub-flavored Markdown exports under `BenchmarkDotNet.Artifacts/results/`
+- a paper-alignment audit section for `bitnet-b1.58-sharp`
 
 The repository also includes a manual trigger workflow at `.github/workflows/benchmark-report.yml` that builds, tests, generates the same report, uploads it as an artifact, and deploys it with GitHub Pages.
 

diff --git a/docs/datagen-guide.md b/docs/datagen-guide.md
@@ -1,6 +1,8 @@
 # DataGen guide
 
-DataGen is the repository's offline synthetic dataset bootstrapper for the paper-aligned BitNet b1.58 runtime. It takes a small set of seed examples, applies deterministic variation patterns, and uses the built-in BitNet transformer to condition each batch with lightweight next-token cues.
+## Overview
+
+`datagen` is the repository's offline synthetic dataset bootstrapper for the paper-aligned BitNet b1.58 runtime. The merged implementation combines deterministic variation patterns with the repository's prompt-template system so each generated JSONL record carries both a reusable DataGen prompt and structured training metadata. Seeds remain optional at runtime, but when supplied they are used for grounding, prompt rendering, and output attribution.
 
 ## Generate a dataset
 
@@ -10,10 +12,25 @@ dotnet run --project src/BitNetSharp.App/BitNetSharp.App.csproj -- datagen \
   --count 50000 \
   --seeds examples/seed-examples.json \
   --output data/synthetic-medical.jsonl \
+  --constraint "Use American English" \
   --lora medical-lora.bin
 ```
 
-The command writes one JSON object per line so the output can flow directly into local fine-tuning or evaluation jobs.
+## Supported options
+
+- `--domain=...` required target domain
+- `--count=...` number of accepted examples to write
+- `--output=...` output JSONL file
+- `--seeds=...` optional JSON array of seed examples using either `instruction`, `prompt`, or `input` plus `response`, `output`, or `answer`
+- `--task-type=...` optional task family label such as `instruction-response`, `qa`, or `classification`
+- `--constraint=...` repeatable natural-language constraints
+- `--constraints=a,b,c` comma-separated constraint shorthand
+- `--output-schema=...` optional schema description injected into the prompt template
+- `--template=...` optional absolute or relative path to a JSON prompt template
+- `--lora=...` optional LoRA artifact path recorded in output metadata
+- `--candidate-count=...` number of self-consistency passes per example, default `3`
+- `--min-quality=...` acceptance threshold between `0.0` and `1.0`, default `0.45`
+- `--max-tokens=...` optional model output limit
 
 ## Seed format
 
@@ -37,32 +54,59 @@ Example:
 ]
 ```
 
+When `--seeds` is omitted, the command synthesizes a neutral seed from the requested domain and task type so small bootstrap runs still work.
+
 ## Output schema
 
-Each JSONL line includes:
+Each JSONL line includes the instruction-response pair plus generation metadata:
+
+```json
+{
+  "instruction": "Create a medical-diagnosis task that starts from this seed: Summarize the patient's main complaint and likely differential diagnosis. [sample 1]",
+  "response": "Use the seed response as the baseline: Restate the complaint, list the most likely causes, and flag any immediate safety concerns. Then adapt it for medical-diagnosis work with extra attention to complaint, diagnosis, safety.",
+  "prompt": "You are DataGen, a domain-agnostic synthetic data generator...",
+  "domain": "medical-diagnosis",
+  "taskType": "instruction-response",
+  "qualityScore": 0.8167,
+  "generationTimestamp": "2026-03-20T00:00:00+00:00",
+  "groundingContext": [
+    "Summarize the patient's main complaint and likely differential diagnosis."
+  ],
+  "lora": "/absolute/path/to/medical-lora.bin",
+  "seedInstruction": "Summarize the patient's main complaint and likely differential diagnosis.",
+  "seedResponse": "Restate the complaint, list the most likely causes, and flag any immediate safety concerns.",
+  "variation": "pattern-1",
+  "generatorModel": "bitnet-b1.58-sharp",
+  "tags": [
+    "synthetic",
+    "offline",
+    "pattern-1",
+    "medical",
+    "diagnosis"
+  ]
+}
+```
+
+The trailing `[sample N]` suffix in `instruction` is part of the actual generated output. It gives each accepted example a stable batch-local ordinal so preview runs and large exports can be inspected without losing their original generation order.
 
-- `domain`
-- `instruction`
-- `response`
-- `seedInstruction`
-- `seedResponse`
-- `variation`
-- `generatorModel`
-- `loraAdapter`
-- `tags`
+## Templates
 
-The optional `--lora` argument is recorded in output metadata so runs can stay attributable even when adapter-conditioned execution is handled outside the CLI.
+The repository ships with a default JSON template at `/templates/datagen/default.json`. Templates expose the placeholders `{domain}`, `{task_type}`, `{seed_examples}`, `{constraints}`, `{output_schema}`, `{count}`, `{sample_number}`, `{variation}`, `{seed_instruction}`, and `{seed_response}`. The built-in variation patterns from the core generator are injected into that template so the two prompt systems stay merged rather than diverging, while the emitted JSON uses camelCase metadata fields such as `taskType`, `qualityScore`, `generationTimestamp`, and `generatorModel`.
 
 ## Quality controls
 
-- Start from diverse seeds that already match the tone and structure you need.
-- Generate a smaller preview set first, then inspect the JSONL output before scaling up.
-- Filter or deduplicate generated samples before fine-tuning if your target pipeline requires stricter curation.
+The current implementation applies lightweight quality scoring to every accepted example:
+
+1. prompt/response schema validation
-1. prompt/response schema validation
+1. basic prompt/response presence checks
-1. prompt/response schema validation
+1. basic prompt/response presence checks
+2. self-consistency scoring across repeated BitNet cue generations
+3. lexical diversity scoring against previously accepted responses
+
+Use a smaller preview run first, inspect the JSONL output, and then scale up counts once the prompt template and constraints match your target domain.
 
 ## Integration notes
 
 DataGen is intentionally local-first:
 
 - generation runs entirely offline
 - output stays in your working directory
-- the same built-in BitNet model ID is recorded with every example for traceability
+- the same built-in BitNet model ID and optional LoRA path are recorded with every example for traceability
diff --git a/docs/training-and-visualization.md b/docs/training-and-visualization.md
@@ -8,9 +8,11 @@ The repository runtime now only uses the paper-aligned BitNet transformer path.
 
 ```bash
 dotnet run --project src/BitNetSharp.App/BitNetSharp.App.csproj -- visualize
+dotnet run --project src/BitNetSharp.App/BitNetSharp.App.csproj -- paper-audit
 ```
 
 This command prints the current paper-model configuration and an aggregated ternary weight histogram across every `BitLinear` projection in the seeded transformer.
+The `paper-audit` command adds a structured checklist on top of that inspection output so the repository can report which paper-aligned architecture requirements are currently implemented and which end-to-end reproduction items are still pending.
 
 ## Inspect next-token predictions
 

diff --git a/docs/usage.md b/docs/usage.md
@@ -35,9 +35,12 @@ This command confirms that the application is wired for Microsoft Agent Framewor
 
 ```bash
 dotnet run --project /home/runner/work/BitNet-b1.58-Sharp/BitNet-b1.58-Sharp/src/BitNetSharp.App/BitNetSharp.App.csproj -- visualize
+dotnet run --project /home/runner/work/BitNet-b1.58-Sharp/BitNet-b1.58-Sharp/src/BitNetSharp.App/BitNetSharp.App.csproj -- paper-audit
 ```
 
-This command prints the current model summary. When the selected model is the paper-aligned BitNet transformer, it also prints the ternary weight histogram across the transformer's `BitLinear` projections.
+The `visualize` command prints the current model summary. When the selected model is the paper-aligned BitNet transformer, it also prints the ternary weight histogram across the transformer's `BitLinear` projections.
+
+The `paper-audit` command turns the paper checklist into an executable report. It confirms the implemented architecture requirements that the repository currently satisfies and explicitly lists the remaining paper-reproduction work that is still pending, such as end-to-end training, perplexity measurement, zero-shot task evaluation, and external checkpoint interoperability.
 
 ## Benchmark
 
@@ -54,15 +57,16 @@ dotnet run --configuration Release --project src/BitNetSharp.App/BitNetSharp.App
 ```
 
 This command runs the BenchmarkDotNet suite, evaluates both built-in models against the shared default training corpus/query script, and writes HTML, Markdown, and JSON comparison reports to the selected output directory.
+For the paper-aligned BitNet model, the generated report also includes a paper-alignment audit section with architecture checks and pending canonical workflow items.
 
 ## DataGen
 
 ```bash
-dotnet run --project src/BitNetSharp.App/BitNetSharp.App.csproj -- datagen --domain "medical-diagnosis" --count 25 --seeds examples/seed-examples.json --output data/synthetic-medical.jsonl
-dotnet run --project src/BitNetSharp.App/BitNetSharp.App.csproj -- datagen --domain "medical-diagnosis" --count 25 --seeds examples/seed-examples.json --output data/synthetic-medical.jsonl --lora medical-lora.bin
+dotnet run --project src/BitNetSharp.App/BitNetSharp.App.csproj -- datagen --domain "medical-diagnosis" --count 25 --output data/synthetic-medical.jsonl
+dotnet run --project src/BitNetSharp.App/BitNetSharp.App.csproj -- datagen --domain "medical-diagnosis" --count 25 --seeds examples/seed-examples.json --output data/synthetic-medical.jsonl --constraint "Use American English" --lora medical-lora.bin
 ```
 
-This command reads a JSON array of seed examples, expands them into synthetic instruction-response pairs, and writes JSONL output for downstream local fine-tuning or evaluation. See the [DataGen guide](datagen-guide.md) for accepted seed aliases and the output schema.
+This command reads optional seed examples, merges the built-in pattern prompts with the repository template, and writes JSONL output for downstream local fine-tuning or evaluation. Optional flags include `--task-type`, `--constraint`, `--constraints`, `--output-schema`, `--template`, `--candidate-count`, `--min-quality`, `--max-tokens`, and `--lora`. The emitted JSONL includes both the core generator fields (`seedInstruction`, `variation`, `generatorModel`, `tags`) and the merged prompt metadata (`prompt`, `taskType`, `qualityScore`, `generationTimestamp`, `groundingContext`). See the [DataGen guide](datagen-guide.md) for accepted seed aliases and the merged output schema.
 
 ## Train the traditional comparison model
 

diff --git a/src/BitNetSharp.App/BitNetPaperAuditCommand.cs b/src/BitNetSharp.App/BitNetPaperAuditCommand.cs
@@ -0,0 +1,36 @@
+using System.Text;
+using BitNetSharp.Core;
+
+namespace BitNetSharp.App;
+
+public static class BitNetPaperAuditCommand
+{
+    public static string FormatReport(BitNetPaperAuditReport report)
+    {
+        ArgumentNullException.ThrowIfNull(report);
+
+        var builder = new StringBuilder();
+        builder.AppendLine($"Paper-alignment audit: {report.ModelId}");
+        builder.AppendLine(report.DisplayName);
+        builder.AppendLine($"Passed: {report.PassedCount}");
+        builder.AppendLine($"Pending: {report.PendingCount}");
+        builder.AppendLine($"Failed: {report.FailedCount}");
+        builder.AppendLine();
+
+        foreach (var check in report.Checks)
+        {
+            builder.AppendLine($"[{FormatStatus(check.Status)}] {check.Area} - {check.Requirement}");
+            builder.AppendLine($"  {check.Details}");
+        }
+
+        return builder.ToString().TrimEnd();
+    }
+
+    private static string FormatStatus(BitNetPaperAuditStatus status) => status switch
+    {
+        BitNetPaperAuditStatus.Passed => "PASS",
+        BitNetPaperAuditStatus.Pending => "PENDING",
+        BitNetPaperAuditStatus.Failed => "FAIL",
+        _ => status.ToString().ToUpperInvariant()
+    };
+}
diff --git a/src/BitNetSharp.App/BitNetSharp.App.csproj b/src/BitNetSharp.App/BitNetSharp.App.csproj
@@ -10,6 +10,13 @@
     <PackageReference Include="Microsoft.Agents.AI.Hosting" Version="1.0.0-preview.260311.1" />
   </ItemGroup>
 
+  <ItemGroup>
+    <Content Include="..\..\templates\datagen\**\*.*">
+      <Link>templates\datagen\%(RecursiveDir)%(Filename)%(Extension)</Link>
+      <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
+    </Content>
+  </ItemGroup>
+
   <PropertyGroup>
     <OutputType>Exe</OutputType>
     <TargetFramework>net10.0</TargetFramework>