Add manual benchmark report pipeline with GitHub Pages publishing#13
Merged
sharpninja merged 4 commits intomainfrom Mar 18, 2026
Merged
Add manual benchmark report pipeline with GitHub Pages publishing#13sharpninja merged 4 commits intomainfrom
sharpninja merged 4 commits intomainfrom
Conversation
Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com>
Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com>
Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Add manual trigger pipeline for model integration testing
Add manual benchmark report pipeline with GitHub Pages publishing
Mar 18, 2026
sharpninja
approved these changes
Mar 18, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new “benchmark-report” entry point to the BitNetSharp.App CLI and CI to generate a static benchmark comparison site (HTML + Markdown + JSON) by combining BenchmarkDotNet exports with a shared corpus/query-script evaluation.
Changes:
- Introduces
HostedAgentBenchmarkReportRunnerto run benchmarks, parse BenchmarkDotNet markdown tables, and write HTML/Markdown/JSON report artifacts. - Adds
benchmark-reportcommand wiring inProgram.csplus a manual GitHub Actions workflow to publish the report via GitHub Pages. - Adds tests for markdown table parsing and report artifact generation; updates docs with usage instructions.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/BitNetSharp.Tests/HostedAgentBenchmarkReportRunnerTests.cs | Adds unit tests for performance-row parsing and report site artifact generation. |
| src/BitNetSharp.App/Program.cs | Adds benchmark-report command handling to generate the comparison report output directory. |
| src/BitNetSharp.App/HostedAgentBenchmarkReportRunner.cs | Implements report generation (benchmark execution, corpus/query evaluation, parsing BDN exports, writing HTML/MD/JSON). |
| docs/usage.md | Documents the new benchmark-report command usage. |
| docs/benchmarking.md | Expands benchmarking docs to describe the comparison report site and GitHub Pages workflow. |
| .github/workflows/benchmark-report.yml | Adds a manual workflow that builds/tests, generates the report, uploads artifact, and deploys to Pages. |
| } | ||
|
|
||
| reports.Add(new HostedAgentBenchmarkModelReport( | ||
| model.ModelId, |
Comment on lines
+194
to
+198
| queryResults.Add(new HostedAgentBenchmarkQueryResult( | ||
| example.Prompt, | ||
| example.Response, | ||
| response.Text, | ||
| !string.IsNullOrWhiteSpace(response.Text), |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change adds a manually triggered benchmark pipeline for comparing
bitnet-b1.58-sharpandtraditional-localon the same built-in training corpus and query script. It produces a publishable report that combines integration-style efficacy/accuracy results with BenchmarkDotNet performance output and deploys that report through GitHub Pages.CLI/report generation
benchmark-reportcommand toBitNetSharp.AppBitNetTrainingCorpus.CreateDefaultExamples()index.htmlcomparison-report.mdcomparison-report.jsonReport contents
GitHub Actions / Pages
.github/workflows/benchmark-report.ymlbenchmark-reportTests
Docs
benchmark-reportcommandExample:
This produces a Pages-ready report directory with a top-level
index.htmlplus linked raw BenchmarkDotNet results.Generated benchmark report UI:
https://github.com/user-attachments/assets/3ae2bfa5-0508-45f4-afd4-524fd6993420
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.