sharpninja
diff --git a/‎.github/workflows/benchmark-report.yml‎
Lines changed: 71 additions & 0 deletions b/‎.github/workflows/benchmark-report.yml‎
Lines changed: 71 additions & 0 deletions
diff --git a/‎docs/benchmarking.md‎
Lines changed: 20 additions & 0 deletions b/‎docs/benchmarking.md‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎docs/usage.md‎
Lines changed: 8 additions & 0 deletions b/‎docs/usage.md‎
Lines changed: 8 additions & 0 deletions
@@ -0,0 +1,71 @@
+name: Benchmark report
+
+on:
+  workflow_dispatch:
+
+permissions:
+  contents: read
+  pages: write
+  id-token: write
+
+concurrency:
+  group: benchmark-report-pages
+  cancel-in-progress: true
+
+jobs:
+  benchmark:
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+
+      - name: Setup .NET SDK
+        uses: actions/setup-dotnet@v4
+        with:
+          global-json-file: global.json
+
+      - name: Restore
+        run: dotnet restore BitNet-b1.58-Sharp.slnx
+
+      - name: Build
+        run: dotnet build BitNet-b1.58-Sharp.slnx --configuration Release --no-restore
+
+      - name: Test
+        run: dotnet test BitNet-b1.58-Sharp.slnx --configuration Release --no-build --no-restore
+
+      - name: Generate benchmark comparison report
+        run: >
+          dotnet run --configuration Release
+          --project "${{ github.workspace }}/src/BitNetSharp.App/BitNetSharp.App.csproj" --
+          benchmark-report
+          --model=bitnet-b1.58-sharp
+          --compare-model=traditional-local
+          --output="${{ github.workspace }}/artifacts/benchmark-report"
+
+      - name: Upload benchmark report artifact
+        uses: actions/upload-artifact@v4
+        with:
+          name: benchmark-report
+          path: ${{ github.workspace }}/artifacts/benchmark-report
+          if-no-files-found: error
+
+      - name: Setup GitHub Pages
+        uses: actions/configure-pages@v5
+
+      - name: Upload GitHub Pages artifact
+        uses: actions/upload-pages-artifact@v3
+        with:
+          path: ${{ github.workspace }}/artifacts/benchmark-report
+
+  deploy:
+    needs: benchmark
+    runs-on: ubuntu-latest
+    environment:
+      name: github-pages
+      url: ${{ steps.deployment.outputs.page_url }}
+
+    steps:
+      - name: Deploy GitHub Pages artifact
+        id: deployment
+        uses: actions/deploy-pages@v4
@@ -15,6 +15,12 @@ The benchmark command uses BenchmarkDotNet to measure the same hosted-model oper
 - streaming a response for a prompt
 - building the agent host
 
+The manual GitHub Actions benchmark report workflow runs the same benchmark suite for both built-in models, then publishes a static comparison site through GitHub Pages. That report combines:
+
+- efficacy, measured as non-empty responses across the shared default query script
+- accuracy, measured as exact-match and expected-token recall against the default corpus responses
+- performance, measured from the exported BenchmarkDotNet results
+
 ## Run the built-in comparison benchmark
 
 ```bash
@@ -23,6 +29,20 @@ dotnet run --configuration Release --project /home/runner/work/BitNet-b1.58-Shar
 
 This runs the BenchmarkDotNet suite over both local models so their hosted response and host-construction costs can be compared directly.
 
+## Generate the comparison report site
+
+```bash
+dotnet run --configuration Release --project src/BitNetSharp.App/BitNetSharp.App.csproj -- benchmark-report --model=bitnet-b1.58-sharp --compare-model=traditional-local --output=/absolute/path/to/benchmark-report
+```
+
+This command writes a static report site with:
+
+- `index.html` for GitHub Pages publishing
+- `comparison-report.md` and `comparison-report.json` summaries
+- raw BenchmarkDotNet HTML, CSV, and GitHub-flavored Markdown exports under `BenchmarkDotNet.Artifacts/results/`
+
+The repository also includes a manual trigger workflow at `.github/workflows/benchmark-report.yml` that builds, tests, generates the same report, uploads it as an artifact, and deploys it with GitHub Pages.
+
 ## Train the traditional local model
 
 ```bash
 
@@ -47,6 +47,14 @@ dotnet run --configuration Release --project /home/runner/work/BitNet-b1.58-Shar
 
 This command runs BenchmarkDotNet over the same hosted-model operations covered by the SpecFlow scenarios so you can compare local models under one agent wrapper.
 
+## Benchmark report
+
+```bash
+dotnet run --configuration Release --project src/BitNetSharp.App/BitNetSharp.App.csproj -- benchmark-report --model=bitnet-b1.58-sharp --compare-model=traditional-local --output=/absolute/path/to/benchmark-report
+```
+
+This command runs the BenchmarkDotNet suite, evaluates both built-in models against the shared default training corpus/query script, and writes HTML, Markdown, and JSON comparison reports to the selected output directory.
+
 ## Train the traditional comparison model
 
 ```bash