Keep published benchmark reports aligned with BitNet training support#18
Keep published benchmark reports aligned with BitNet training support#18sharpninja merged 3 commits intomainfrom
Conversation
Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com>
Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/BitNet-b1.58-Sharp/sessions/bf14f461-8ce5-40da-b0d3-3d3b122141fa
There was a problem hiding this comment.
Pull request overview
Updates the benchmark-report publication pipeline and related runtime surfaces so published GitHub Pages benchmark reports stay in sync with the paper-aligned BitNet model’s training/coverage status.
Changes:
- Make
benchmark-reportworkflow run automatically onmainpushes affecting core/app/tests (in addition to manual dispatch). - Add/adjust tests to lock in report rendering for “Completed (6 examples, 3 epochs)” training status and paper-audit “no pending” coverage.
- Extend the paper-aligned model/audit surface with output-head fine-tuning, hidden-state forwarding, and checkpoint round-trip validation.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/benchmark-report.yml | Triggers benchmark report publishing on main pushes for relevant paths. |
| src/BitNetSharp.App/BitNetHostedAgentModel.cs | Exposes paper-aligned model training via ITrainableHostedAgentModel. |
| src/BitNetSharp.App/HostedAgentBenchmarks.cs | Treats bitnet-b1.58-sharp as trainable in benchmark parameter selection. |
| src/BitNetSharp.App/Program.cs | Updates CLI training messaging when a model is not trainable. |
| src/BitNetSharp.Core/BitNetPaperAudit.cs | Shifts audit items from “pending roadmap” to runtime/benchmark coverage checks (training/perplexity/zero-shot/checkpoint). |
| src/BitNetSharp.Core/BitNetPaperCheckpoint.cs | Adds checkpoint save/load + round-trip validation for the paper model. |
| src/BitNetSharp.Core/BitNetPaperModel.cs | Adds a repository-local training routine and exposes internal helpers used by audit/training. |
| src/BitNetSharp.Core/Models/BitNetTransformer.cs | Splits forwarding into hidden-state vs logits paths. |
| tests/BitNetSharp.Tests/BitNetPaperAuditTests.cs | Updates assertions for “runtime coverage” (no pending checks). |
| tests/BitNetSharp.Tests/Features/PaperAlignedRuntime.feature | Updates audit step wording and adds BitNet model to training examples. |
| tests/BitNetSharp.Tests/HostedAgentBenchmarksExecutionTests.cs | Adds a focused test that exercises the BitNet training benchmark path. |
| tests/BitNetSharp.Tests/HostedAgentBenchmarkReportRunnerTests.cs | Adds regression test ensuring report renders “Completed (6 examples, 3 epochs)” and not “Not supported”. |
| tests/BitNetSharp.Tests/Steps/PaperAlignedRuntimeSteps.cs | Updates SpecFlow step to assert no pending paper-audit items. |
| docs/benchmarking.md | Documents that the benchmark-report workflow is automatic + manual. |
| docs/usage.md | Updates CLI/paper-audit/training documentation to reflect new runtime coverage. |
| var stats = GetTernaryWeightStats(); | ||
| return new TrainingReport( | ||
| lossHistory, | ||
| trainingSet.Count * epochs, | ||
| epochs, | ||
| stats.NegativeCount, | ||
| stats.ZeroCount, | ||
| stats.PositiveCount); |
There was a problem hiding this comment.
TrainingReport.SamplesSeen is currently computed as trainingSet.Count * epochs, but this method can continue when a response tokenizes to zero tokens, so the number of training updates actually performed can be lower. Track the number of observations actually used (e.g., accumulate observations across epochs) and report that instead so SamplesSeen aligns with the loss history.
| var document = new BitNetPaperCheckpointDocument( | ||
| FormatName, | ||
| model.ModelId, | ||
| BootstrapSeed, | ||
| model.Config, |
There was a problem hiding this comment.
Checkpoint save hard-codes BootstrapSeed = 42 rather than persisting the actual seed used to initialize the model/transformer. Since BitNetPaperModel has a public seed constructor parameter, this makes Save/Load unable to round-trip models created with non-default seeds. Persist the model’s actual seed (or remove the seed parameter / make the constraint explicit and validated) so the checkpoint format is correct for all supported instances.
| var result = new float[matrix.Length, matrix[0].Length]; | ||
| for (var row = 0; row < matrix.Length; row++) | ||
| { | ||
| for (var column = 0; column < matrix[row].Length; column++) | ||
| { |
There was a problem hiding this comment.
ToMatrix assumes a rectangular float[][] and sizes the 2D array from matrix[0].Length, but then iterates up to matrix[row].Length. If any row is longer than the first (or null), this will throw or produce a partially initialized matrix. Validate that all rows are non-null and have the same length (and throw a clear exception) before allocating/copying.
| var result = new float[matrix.Length, matrix[0].Length]; | |
| for (var row = 0; row < matrix.Length; row++) | |
| { | |
| for (var column = 0; column < matrix[row].Length; column++) | |
| { | |
| var firstRow = matrix[0] | |
| ?? throw new ArgumentException("The matrix contains a null row at index 0.", nameof(matrix)); | |
| var columns = firstRow.Length; | |
| for (var row = 0; row < matrix.Length; row++) | |
| { | |
| var currentRow = matrix[row] | |
| ?? throw new ArgumentException($"The matrix contains a null row at index {row}.", nameof(matrix)); | |
| if (currentRow.Length != columns) | |
| { | |
| throw new ArgumentException( | |
| $"All rows must have the same length. Expected {columns} columns but row {row} has {currentRow.Length}.", | |
| nameof(matrix)); | |
| } | |
| } | |
| var result = new float[matrix.Length, columns]; | |
| for (var row = 0; row < matrix.Length; row++) | |
| { | |
| for (var column = 0; column < columns; column++) | |
| { |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4780986fae
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| var document = new BitNetPaperCheckpointDocument( | ||
| FormatName, | ||
| model.ModelId, | ||
| BootstrapSeed, |
There was a problem hiding this comment.
Persist the model seed in paper checkpoints
Save always writes BootstrapSeed as the constant 42, so checkpoints created from BitNetPaperModel instances initialized with a different seed reload with different embeddings/layers and no longer represent the original model state. This breaks the public Save/Load contract for custom-seeded models (a valid constructor path) and can change generation/metrics after reload.
Useful? React with 👍 / 👎.
| document.PrimaryLanguage), | ||
| document.Config, | ||
| document.BootstrapSeed); | ||
| model.ImportOutputHeadWeights(ToMatrix(document.OutputHeadWeights)); |
There was a problem hiding this comment.
Avoid re-quantizing checkpointed output-head weights
Load feeds document.OutputHeadWeights back through ImportOutputHeadWeights, which re-quantizes weights that were already quantized by Save (ExportOutputHeadWeights). Because re-quantization recomputes Gamma, repeated save/load cycles drift the output-head scale (especially with zero-valued ternary entries), so checkpoint round-trips are not numerically idempotent and can skew downstream perplexity/training behavior even when top-token text still matches.
Useful? React with 👍 / 👎.
The published benchmark report could still show
bitnet-b1.58-sharpasNot supportedfor training even though the current hosted model implements the trainable surface. The gap was in report publication freshness, not in the active report-generation logic.Report publication
.github/workflows/benchmark-report.ymlto run on pushes tomainwhen benchmark/runtime/test paths change, in addition to manual dispatch.Regression coverage
Completed (6 examples, 3 epochs)Not supportedHostedAgentBenchmarkReportRunnerTests.Docs
docs/benchmarking.mdthat the benchmark report workflow is now both automatic and manually triggerable.Example of the rendered report state now covered by test:
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.