feat(skill): perf-trend render bench history report across recent main CI runs#396
Merged
Conversation
…n CI runs Adds a new perf-trend skill (and supporting scripts/perf-trend.mjs + `npm run perf:trend`) that pulls the �ench-output artifacts from recent successful ci.yml runs on main, parses criterion bencher output, and renders a markdown trend report. Why --- The �ench job's per-run summary table (in the GitHub job summary) shows only the latest measurement vs budget. It tells you nothing about whether the value is moving, whether the regression is new, or whether other budgeted benches are silently missing from the output. This skill answers all three. Output sections --------------- 1. **⚠ Missing budgeted benches** flags any bench named in the workflow's declare -A BUDGETS block that produced no output in the window. Root cause is usually an earlier-running bench harness panicking and short-circuiting `cargo bench` for the rest of the suite. The existing CI summary is silent about this; surfacing it is the headline reason this skill exists. 2. **Summary** per-bench row with latest, budget, window median, % delta vs median, status (` / over budget), and an ASCII sparkline (left=oldest, right=newest). 3. **Movers** top regressions / improvements (10% off median, 3 samples), each citing the commit SHA + PR title of the latest run. 4. **Per-bench history** <details> block with the full (date, commit, ns/iter, dev, title) series. Validation against current main ------------------------------- Running `npm run perf:trend` on the 14-day window surfaced two latent issues that the per-run CI summary has been silent about: - `hot_path/get_file_comments_large` has been ~3-4 over its 20 ms budget for at least 8 consecutive runs (~70-83 ms). - `matching_bench` panics with *"yaml anchors/aliases not allowed in sidecars"* on every run, which short-circuits `cargo bench` and drops 4 of 5 budgeted benches from the output: `matching/50_comments_1000_lines`, `fold_regions/large_100kb_jsonlike/default`, `parse_kql/pipeline_50_steps`, `strip_json_comments/large_100kb`. Both are pre-existing repo issues fixing them is out of scope for this PR, but the report makes them impossible to miss next time. Design notes ------------ - Budgets parsed live from .github/workflows/ci.yml (single source of truth updates to the workflow's BUDGETS table flow through automatically). - 14-day window matches the bench-output artifact retention. - --summary flag appends to `\` so a future scheduled-workflow caller can post the same report to a job page without rewriting the renderer. - Read-only never writes to the repo, never opens issues. Acting on findings is for the user or a follow-on skill. Verified -------- - `npm run lint:skills` (11 skills now, all references resolve) - `npm run lint:markdown-surfaces` - `npx eslint scripts/perf-trend.mjs` - `node scripts/perf-trend.mjs --help` - `node scripts/perf-trend.mjs --days 14 --runs 10` renders the full report - `node scripts/perf-trend.mjs --summary` with `\` set writes 35 lines to the summary file Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a new
perf-trendskill plus the supportingscripts/perf-trend.mjshelper andnpm run perf:trendwrapper. The skill pulls thebench-outputartifact from recent successfulci.ymlruns onmain, parses the criterion bencher output, joins it with each run's commit SHA + PR title, and renders a markdown trend report.Why
The
benchjob's per-run job-summary table shows only latest vs budget. It tells you nothing about:cargo benchpanicked.The
perf-trendskill answers all three on demand.Report sections
BUDGETStable that produced no output in the window. The CI summary step is silent about these; surfacing them is the headline reason this skill exists.<details>block with the full series.What it found on current
mainRunning on the 14-day window already surfaces two latent issues the per-run CI summary has been silent about:
hot_path/get_file_comments_largehas been ~34 over its 20 ms budget (~7083 ms) for at least 8 consecutive runs.matching_benchpanics with "yaml anchors/aliases not allowed in sidecars" on every run, short-circuitingcargo benchand dropping 4 of 5 budgeted benches from the output (matching/50_comments_1000_lines,fold_regions/large_100kb_jsonlike/default,parse_kql/pipeline_50_steps,strip_json_comments/large_100kb).Both are pre-existing repo issues outside the scope of this PR, but the report makes them impossible to miss going forward.
Design
.github/workflows/ci.ymlsingle source of truth; if the workflow's BUDGETS table changes, the script follows.bench-outputartifact retention.--summaryflag appends the report to\so a future scheduled-workflow caller can post the same renderer's output to a job page without rewriting it.Verified
npm run lint:skills(11 skills now, allnpm runreferences resolve)npm run lint:markdown-surfacesnpx eslint scripts/perf-trend.mjsnode scripts/perf-trend.mjs --helpnode scripts/perf-trend.mjs --days 14 --runs 10renders full report with Missing-benches sectionnode scripts/perf-trend.mjs --summarywith\set writes 35 lines to the summary file