Skip to content

Add manual benchmark report pipeline with GitHub Pages publishing#13

Merged
sharpninja merged 4 commits intomainfrom
copilot/create-manual-trigger-pipeline
Mar 18, 2026
Merged

Add manual benchmark report pipeline with GitHub Pages publishing#13
sharpninja merged 4 commits intomainfrom
copilot/create-manual-trigger-pipeline

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 18, 2026

This change adds a manually triggered benchmark pipeline for comparing bitnet-b1.58-sharp and traditional-local on the same built-in training corpus and query script. It produces a publishable report that combines integration-style efficacy/accuracy results with BenchmarkDotNet performance output and deploys that report through GitHub Pages.

  • CLI/report generation

    • Adds a benchmark-report command to BitNetSharp.App
    • Runs the existing BenchmarkDotNet suites for the selected models
    • Evaluates both models against the shared default query set from BitNetTrainingCorpus.CreateDefaultExamples()
    • Emits a static report bundle:
      • index.html
      • comparison-report.md
      • comparison-report.json
      • raw BenchmarkDotNet HTML/CSV/Markdown exports
  • Report contents

    • Captures:
      • efficacy: non-empty response rate across the shared query script
      • accuracy: exact-match rate and expected-token recall against corpus responses
      • performance: BenchmarkDotNet host/query/stream/train measurements
    • Preserves the current model behavior:
      • BitNet remains inference-only
      • training benchmarks stay scoped to trainable models
  • GitHub Actions / Pages

    • Adds a dedicated manual workflow at .github/workflows/benchmark-report.yml
    • Workflow:
      • builds/tests the solution
      • runs benchmark-report
      • uploads the generated report as an artifact
      • deploys the static site to GitHub Pages
  • Tests

    • Adds focused coverage for:
      • parsing BenchmarkDotNet markdown tables into comparison rows
      • writing the static HTML/Markdown/JSON report site
  • Docs

    • Documents the new benchmark-report command
    • Documents the manual benchmark-report workflow and generated artifact layout

Example:

dotnet run --configuration Release \
  --project src/BitNetSharp.App/BitNetSharp.App.csproj -- \
  benchmark-report \
  --model=bitnet-b1.58-sharp \
  --compare-model=traditional-local \
  --output=/absolute/path/to/benchmark-report

This produces a Pages-ready report directory with a top-level index.html plus linked raw BenchmarkDotNet results.

  • Example output shape
    benchmark-report/
    ├── index.html
    ├── comparison-report.md
    ├── comparison-report.json
    └── BenchmarkDotNet.Artifacts/
        └── results/
            ├── *-report.html
            ├── *-report.csv
            └── *-report-github.md
    

Generated benchmark report UI:
https://github.com/user-attachments/assets/3ae2bfa5-0508-45f4-afd4-524fd6993420


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com>
Copilot AI and others added 2 commits March 18, 2026 04:07
Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com>
Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com>
Copilot AI changed the title [WIP] Add manual trigger pipeline for model integration testing Add manual benchmark report pipeline with GitHub Pages publishing Mar 18, 2026
@sharpninja sharpninja marked this pull request as ready for review March 18, 2026 04:21
Copilot AI review requested due to automatic review settings March 18, 2026 04:21
@sharpninja sharpninja merged commit 568e2b7 into main Mar 18, 2026
3 checks passed
@sharpninja sharpninja deleted the copilot/create-manual-trigger-pipeline branch March 18, 2026 04:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “benchmark-report” entry point to the BitNetSharp.App CLI and CI to generate a static benchmark comparison site (HTML + Markdown + JSON) by combining BenchmarkDotNet exports with a shared corpus/query-script evaluation.

Changes:

  • Introduces HostedAgentBenchmarkReportRunner to run benchmarks, parse BenchmarkDotNet markdown tables, and write HTML/Markdown/JSON report artifacts.
  • Adds benchmark-report command wiring in Program.cs plus a manual GitHub Actions workflow to publish the report via GitHub Pages.
  • Adds tests for markdown table parsing and report artifact generation; updates docs with usage instructions.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/BitNetSharp.Tests/HostedAgentBenchmarkReportRunnerTests.cs Adds unit tests for performance-row parsing and report site artifact generation.
src/BitNetSharp.App/Program.cs Adds benchmark-report command handling to generate the comparison report output directory.
src/BitNetSharp.App/HostedAgentBenchmarkReportRunner.cs Implements report generation (benchmark execution, corpus/query evaluation, parsing BDN exports, writing HTML/MD/JSON).
docs/usage.md Documents the new benchmark-report command usage.
docs/benchmarking.md Expands benchmarking docs to describe the comparison report site and GitHub Pages workflow.
.github/workflows/benchmark-report.yml Adds a manual workflow that builds/tests, generates the report, uploads artifact, and deploys to Pages.

}

reports.Add(new HostedAgentBenchmarkModelReport(
model.ModelId,
Comment on lines +194 to +198
queryResults.Add(new HostedAgentBenchmarkQueryResult(
example.Prompt,
example.Response,
response.Text,
!string.IsNullOrWhiteSpace(response.Text),
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants