Add manual benchmark report pipeline with GitHub Pages publishing by Copilot · Pull Request #13 · sharpninja/BitNet-b1.58-Sharp

Copilot · 2026-03-18T01:34:38Z

This change adds a manually triggered benchmark pipeline for comparing bitnet-b1.58-sharp and traditional-local on the same built-in training corpus and query script. It produces a publishable report that combines integration-style efficacy/accuracy results with BenchmarkDotNet performance output and deploys that report through GitHub Pages.

CLI/report generation
- Adds a benchmark-report command to BitNetSharp.App
- Runs the existing BenchmarkDotNet suites for the selected models
- Evaluates both models against the shared default query set from BitNetTrainingCorpus.CreateDefaultExamples()
- Emits a static report bundle:
  - index.html
  - comparison-report.md
  - comparison-report.json
  - raw BenchmarkDotNet HTML/CSV/Markdown exports
Report contents
- Captures:
  - efficacy: non-empty response rate across the shared query script
  - accuracy: exact-match rate and expected-token recall against corpus responses
  - performance: BenchmarkDotNet host/query/stream/train measurements
- Preserves the current model behavior:
  - BitNet remains inference-only
  - training benchmarks stay scoped to trainable models
GitHub Actions / Pages
- Adds a dedicated manual workflow at .github/workflows/benchmark-report.yml
- Workflow:
  - builds/tests the solution
  - runs benchmark-report
  - uploads the generated report as an artifact
  - deploys the static site to GitHub Pages
Tests
- Adds focused coverage for:
  - parsing BenchmarkDotNet markdown tables into comparison rows
  - writing the static HTML/Markdown/JSON report site
Docs
- Documents the new benchmark-report command
- Documents the manual benchmark-report workflow and generated artifact layout

Example:

dotnet run --configuration Release \
  --project src/BitNetSharp.App/BitNetSharp.App.csproj -- \
  benchmark-report \
  --model=bitnet-b1.58-sharp \
  --compare-model=traditional-local \
  --output=/absolute/path/to/benchmark-report

This produces a Pages-ready report directory with a top-level index.html plus linked raw BenchmarkDotNet results.

Example output shape

benchmark-report/
├── index.html
├── comparison-report.md
├── comparison-report.json
└── BenchmarkDotNet.Artifacts/
    └── results/
        ├── *-report.html
        ├── *-report.csv
        └── *-report-github.md

Generated benchmark report UI:
https://github.com/user-attachments/assets/3ae2bfa5-0508-45f4-afd4-524fd6993420

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com>

Copilot

Pull request overview

Adds a new “benchmark-report” entry point to the BitNetSharp.App CLI and CI to generate a static benchmark comparison site (HTML + Markdown + JSON) by combining BenchmarkDotNet exports with a shared corpus/query-script evaluation.

Changes:

Introduces HostedAgentBenchmarkReportRunner to run benchmarks, parse BenchmarkDotNet markdown tables, and write HTML/Markdown/JSON report artifacts.
Adds benchmark-report command wiring in Program.cs plus a manual GitHub Actions workflow to publish the report via GitHub Pages.
Adds tests for markdown table parsing and report artifact generation; updates docs with usage instructions.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/BitNetSharp.Tests/HostedAgentBenchmarkReportRunnerTests.cs	Adds unit tests for performance-row parsing and report site artifact generation.
src/BitNetSharp.App/Program.cs	Adds `benchmark-report` command handling to generate the comparison report output directory.
src/BitNetSharp.App/HostedAgentBenchmarkReportRunner.cs	Implements report generation (benchmark execution, corpus/query evaluation, parsing BDN exports, writing HTML/MD/JSON).
docs/usage.md	Documents the new `benchmark-report` command usage.
docs/benchmarking.md	Expands benchmarking docs to describe the comparison report site and GitHub Pages workflow.
.github/workflows/benchmark-report.yml	Adds a manual workflow that builds/tests, generates the report, uploads artifact, and deploys to Pages.

src/BitNetSharp.App/HostedAgentBenchmarkReportRunner.cs

+            }
+
+            reports.Add(new HostedAgentBenchmarkModelReport(
+                model.ModelId,


src/BitNetSharp.App/HostedAgentBenchmarkReportRunner.cs

+                queryResults.Add(new HostedAgentBenchmarkQueryResult(
+                    example.Prompt,
+                    example.Response,
+                    response.Text,
+                    !string.IsNullOrWhiteSpace(response.Text),


Initial plan

82ab169

Copilot AI assigned Copilot and sharpninja Mar 18, 2026

Copilot started work on behalf of sharpninja March 18, 2026 01:34 View session

Changes before error encountered

cbae378

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com>

Copilot AI requested a review from sharpninja March 18, 2026 01:48

Copilot started work on behalf of sharpninja March 18, 2026 04:04 View session

Copilot AI and others added 2 commits March 18, 2026 04:07

Add benchmark report generation flow

8e6f63c

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com>

Finalize benchmark report pipeline

1161777

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add manual trigger pipeline for model integration testing~~ Add manual benchmark report pipeline with GitHub Pages publishing Mar 18, 2026

Copilot finished work on behalf of sharpninja March 18, 2026 04:14

sharpninja approved these changes Mar 18, 2026

View reviewed changes

sharpninja marked this pull request as ready for review March 18, 2026 04:21

Copilot AI review requested due to automatic review settings March 18, 2026 04:21

sharpninja merged commit 568e2b7 into main Mar 18, 2026
3 checks passed

sharpninja deleted the copilot/create-manual-trigger-pipeline branch March 18, 2026 04:21

Copilot started reviewing on behalf of sharpninja March 18, 2026 04:21 View session

Copilot AI reviewed Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add manual benchmark report pipeline with GitHub Pages publishing#13

Add manual benchmark report pipeline with GitHub Pages publishing#13
sharpninja merged 4 commits intomainfrom
copilot/create-manual-trigger-pipeline

Copilot AI commented Mar 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Mar 18, 2026 •

edited

Loading