Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/benchmarking.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ The manual GitHub Actions benchmark report workflow runs the same benchmark suit
- efficacy, measured as non-empty responses across the shared default query script
- accuracy, measured as exact-match and expected-token recall against the default corpus responses
- performance, measured from the exported BenchmarkDotNet results
- a paper-alignment audit for the canonical BitNet model so the report shows both implemented architecture guarantees and still-pending paper reproduction work

## Run the built-in comparison benchmark

Expand All @@ -40,6 +41,7 @@ This command writes a static report site with:
- `index.html` for GitHub Pages publishing
- `comparison-report.md` and `comparison-report.json` summaries
- raw BenchmarkDotNet HTML, CSV, and GitHub-flavored Markdown exports under `BenchmarkDotNet.Artifacts/results/`
- a paper-alignment audit section for `bitnet-b1.58-sharp`

The repository also includes a manual trigger workflow at `.github/workflows/benchmark-report.yml` that builds, tests, generates the same report, uploads it as an artifact, and deploys it with GitHub Pages.

Expand Down
78 changes: 61 additions & 17 deletions docs/datagen-guide.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# DataGen guide

DataGen is the repository's offline synthetic dataset bootstrapper for the paper-aligned BitNet b1.58 runtime. It takes a small set of seed examples, applies deterministic variation patterns, and uses the built-in BitNet transformer to condition each batch with lightweight next-token cues.
## Overview

`datagen` is the repository's offline synthetic dataset bootstrapper for the paper-aligned BitNet b1.58 runtime. The merged implementation combines deterministic variation patterns with the repository's prompt-template system so each generated JSONL record carries both a reusable DataGen prompt and structured training metadata. Seeds remain optional at runtime, but when supplied they are used for grounding, prompt rendering, and output attribution.

## Generate a dataset

Expand All @@ -10,10 +12,25 @@ dotnet run --project src/BitNetSharp.App/BitNetSharp.App.csproj -- datagen \
--count 50000 \
--seeds examples/seed-examples.json \
--output data/synthetic-medical.jsonl \
--constraint "Use American English" \
--lora medical-lora.bin
```

The command writes one JSON object per line so the output can flow directly into local fine-tuning or evaluation jobs.
## Supported options

- `--domain=...` required target domain
- `--count=...` number of accepted examples to write
- `--output=...` output JSONL file
- `--seeds=...` optional JSON array of seed examples using either `instruction`, `prompt`, or `input` plus `response`, `output`, or `answer`
- `--task-type=...` optional task family label such as `instruction-response`, `qa`, or `classification`
- `--constraint=...` repeatable natural-language constraints
- `--constraints=a,b,c` comma-separated constraint shorthand
- `--output-schema=...` optional schema description injected into the prompt template
- `--template=...` optional absolute or relative path to a JSON prompt template
- `--lora=...` optional LoRA artifact path recorded in output metadata
- `--candidate-count=...` number of self-consistency passes per example, default `3`
- `--min-quality=...` acceptance threshold between `0.0` and `1.0`, default `0.45`
- `--max-tokens=...` optional model output limit

## Seed format

Expand All @@ -37,32 +54,59 @@ Example:
]
```

When `--seeds` is omitted, the command synthesizes a neutral seed from the requested domain and task type so small bootstrap runs still work.

## Output schema

Each JSONL line includes:
Each JSONL line includes the instruction-response pair plus generation metadata:

```json
{
"instruction": "Create a medical-diagnosis task that starts from this seed: Summarize the patient's main complaint and likely differential diagnosis. [sample 1]",
"response": "Use the seed response as the baseline: Restate the complaint, list the most likely causes, and flag any immediate safety concerns. Then adapt it for medical-diagnosis work with extra attention to complaint, diagnosis, safety.",
"prompt": "You are DataGen, a domain-agnostic synthetic data generator...",
"domain": "medical-diagnosis",
"taskType": "instruction-response",
"qualityScore": 0.8167,
"generationTimestamp": "2026-03-20T00:00:00+00:00",
"groundingContext": [
"Summarize the patient's main complaint and likely differential diagnosis."
],
"lora": "/absolute/path/to/medical-lora.bin",
"seedInstruction": "Summarize the patient's main complaint and likely differential diagnosis.",
"seedResponse": "Restate the complaint, list the most likely causes, and flag any immediate safety concerns.",
"variation": "pattern-1",
"generatorModel": "bitnet-b1.58-sharp",
"tags": [
"synthetic",
"offline",
"pattern-1",
"medical",
"diagnosis"
]
}
```

The trailing `[sample N]` suffix in `instruction` is part of the actual generated output. It gives each accepted example a stable batch-local ordinal so preview runs and large exports can be inspected without losing their original generation order.

- `domain`
- `instruction`
- `response`
- `seedInstruction`
- `seedResponse`
- `variation`
- `generatorModel`
- `loraAdapter`
- `tags`
## Templates

The optional `--lora` argument is recorded in output metadata so runs can stay attributable even when adapter-conditioned execution is handled outside the CLI.
The repository ships with a default JSON template at `/templates/datagen/default.json`. Templates expose the placeholders `{domain}`, `{task_type}`, `{seed_examples}`, `{constraints}`, `{output_schema}`, `{count}`, `{sample_number}`, `{variation}`, `{seed_instruction}`, and `{seed_response}`. The built-in variation patterns from the core generator are injected into that template so the two prompt systems stay merged rather than diverging, while the emitted JSON uses camelCase metadata fields such as `taskType`, `qualityScore`, `generationTimestamp`, and `generatorModel`.

## Quality controls

- Start from diverse seeds that already match the tone and structure you need.
- Generate a smaller preview set first, then inspect the JSONL output before scaling up.
- Filter or deduplicate generated samples before fine-tuning if your target pipeline requires stricter curation.
The current implementation applies lightweight quality scoring to every accepted example:

1. prompt/response schema validation
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section claims the implementation performs “prompt/response schema validation,” but the current ComputeQualityScore logic only checks for non-empty prompt/response and doesn’t validate against --output-schema (or any structured schema) before accepting. Either implement schema/format validation or adjust this doc wording to match the current behavior.

Suggested change
1. prompt/response schema validation
1. basic prompt/response presence checks

Copilot uses AI. Check for mistakes.
2. self-consistency scoring across repeated BitNet cue generations
3. lexical diversity scoring against previously accepted responses

Use a smaller preview run first, inspect the JSONL output, and then scale up counts once the prompt template and constraints match your target domain.

## Integration notes

DataGen is intentionally local-first:

- generation runs entirely offline
- output stays in your working directory
- the same built-in BitNet model ID is recorded with every example for traceability
- the same built-in BitNet model ID and optional LoRA path are recorded with every example for traceability
2 changes: 2 additions & 0 deletions docs/training-and-visualization.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,11 @@ The repository runtime now only uses the paper-aligned BitNet transformer path.

```bash
dotnet run --project src/BitNetSharp.App/BitNetSharp.App.csproj -- visualize
dotnet run --project src/BitNetSharp.App/BitNetSharp.App.csproj -- paper-audit
```

This command prints the current paper-model configuration and an aggregated ternary weight histogram across every `BitLinear` projection in the seeded transformer.
The `paper-audit` command adds a structured checklist on top of that inspection output so the repository can report which paper-aligned architecture requirements are currently implemented and which end-to-end reproduction items are still pending.

## Inspect next-token predictions

Expand Down
12 changes: 8 additions & 4 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,12 @@ This command confirms that the application is wired for Microsoft Agent Framewor

```bash
dotnet run --project /home/runner/work/BitNet-b1.58-Sharp/BitNet-b1.58-Sharp/src/BitNetSharp.App/BitNetSharp.App.csproj -- visualize
dotnet run --project /home/runner/work/BitNet-b1.58-Sharp/BitNet-b1.58-Sharp/src/BitNetSharp.App/BitNetSharp.App.csproj -- paper-audit
```

This command prints the current model summary. When the selected model is the paper-aligned BitNet transformer, it also prints the ternary weight histogram across the transformer's `BitLinear` projections.
The `visualize` command prints the current model summary. When the selected model is the paper-aligned BitNet transformer, it also prints the ternary weight histogram across the transformer's `BitLinear` projections.

The `paper-audit` command turns the paper checklist into an executable report. It confirms the implemented architecture requirements that the repository currently satisfies and explicitly lists the remaining paper-reproduction work that is still pending, such as end-to-end training, perplexity measurement, zero-shot task evaluation, and external checkpoint interoperability.

## Benchmark

Expand All @@ -54,15 +57,16 @@ dotnet run --configuration Release --project src/BitNetSharp.App/BitNetSharp.App
```

This command runs the BenchmarkDotNet suite, evaluates both built-in models against the shared default training corpus/query script, and writes HTML, Markdown, and JSON comparison reports to the selected output directory.
For the paper-aligned BitNet model, the generated report also includes a paper-alignment audit section with architecture checks and pending canonical workflow items.

## DataGen

```bash
dotnet run --project src/BitNetSharp.App/BitNetSharp.App.csproj -- datagen --domain "medical-diagnosis" --count 25 --seeds examples/seed-examples.json --output data/synthetic-medical.jsonl
dotnet run --project src/BitNetSharp.App/BitNetSharp.App.csproj -- datagen --domain "medical-diagnosis" --count 25 --seeds examples/seed-examples.json --output data/synthetic-medical.jsonl --lora medical-lora.bin
dotnet run --project src/BitNetSharp.App/BitNetSharp.App.csproj -- datagen --domain "medical-diagnosis" --count 25 --output data/synthetic-medical.jsonl
dotnet run --project src/BitNetSharp.App/BitNetSharp.App.csproj -- datagen --domain "medical-diagnosis" --count 25 --seeds examples/seed-examples.json --output data/synthetic-medical.jsonl --constraint "Use American English" --lora medical-lora.bin
```

This command reads a JSON array of seed examples, expands them into synthetic instruction-response pairs, and writes JSONL output for downstream local fine-tuning or evaluation. See the [DataGen guide](datagen-guide.md) for accepted seed aliases and the output schema.
This command reads optional seed examples, merges the built-in pattern prompts with the repository template, and writes JSONL output for downstream local fine-tuning or evaluation. Optional flags include `--task-type`, `--constraint`, `--constraints`, `--output-schema`, `--template`, `--candidate-count`, `--min-quality`, `--max-tokens`, and `--lora`. The emitted JSONL includes both the core generator fields (`seedInstruction`, `variation`, `generatorModel`, `tags`) and the merged prompt metadata (`prompt`, `taskType`, `qualityScore`, `generationTimestamp`, `groundingContext`). See the [DataGen guide](datagen-guide.md) for accepted seed aliases and the merged output schema.

## Train the traditional comparison model

Expand Down
36 changes: 36 additions & 0 deletions src/BitNetSharp.App/BitNetPaperAuditCommand.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
using System.Text;
using BitNetSharp.Core;

namespace BitNetSharp.App;

public static class BitNetPaperAuditCommand
{
public static string FormatReport(BitNetPaperAuditReport report)
{
ArgumentNullException.ThrowIfNull(report);

var builder = new StringBuilder();
builder.AppendLine($"Paper-alignment audit: {report.ModelId}");
builder.AppendLine(report.DisplayName);
builder.AppendLine($"Passed: {report.PassedCount}");
builder.AppendLine($"Pending: {report.PendingCount}");
builder.AppendLine($"Failed: {report.FailedCount}");
builder.AppendLine();

foreach (var check in report.Checks)
{
builder.AppendLine($"[{FormatStatus(check.Status)}] {check.Area} - {check.Requirement}");
builder.AppendLine($" {check.Details}");
}

return builder.ToString().TrimEnd();
}

private static string FormatStatus(BitNetPaperAuditStatus status) => status switch
{
BitNetPaperAuditStatus.Passed => "PASS",
BitNetPaperAuditStatus.Pending => "PENDING",
BitNetPaperAuditStatus.Failed => "FAIL",
_ => status.ToString().ToUpperInvariant()
};
}
7 changes: 7 additions & 0 deletions src/BitNetSharp.App/BitNetSharp.App.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,13 @@
<PackageReference Include="Microsoft.Agents.AI.Hosting" Version="1.0.0-preview.260311.1" />
</ItemGroup>

<ItemGroup>
<Content Include="..\..\templates\datagen\**\*.*">
<Link>templates\datagen\%(RecursiveDir)%(Filename)%(Extension)</Link>
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</Content>
</ItemGroup>

<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net10.0</TargetFramework>
Expand Down
Loading
Loading