Skip to content

feat(skill): perf-trend render bench history report across recent main CI runs#396

Merged
dryotta merged 1 commit into
mainfrom
chore/perf-trend-skill
May 16, 2026
Merged

feat(skill): perf-trend render bench history report across recent main CI runs#396
dryotta merged 1 commit into
mainfrom
chore/perf-trend-skill

Conversation

@dryotta

@dryotta dryotta commented May 16, 2026

Copy link
Copy Markdown
Owner

What

Adds a new perf-trend skill plus the supporting scripts/perf-trend.mjs helper and npm run perf:trend wrapper. The skill pulls the bench-output artifact from recent successful ci.yml runs on main, parses the criterion bencher output, joins it with each run's commit SHA + PR title, and renders a markdown trend report.

Why

The bench job's per-run job-summary table shows only latest vs budget. It tells you nothing about:

  • whether a number is moving over time,
  • whether a regression is new or chronic,
  • whether other budgeted benches are silently missing because an earlier harness in cargo bench panicked.

The perf-trend skill answers all three on demand.

Report sections

  1. ** Missing budgeted benches** names from the workflow's BUDGETS table that produced no output in the window. The CI summary step is silent about these; surfacing them is the headline reason this skill exists.
  2. Summary per-bench row: latest, budget, window median, Δ vs median, status ( / over budget), ASCII sparkline (oldestnewest).
  3. Movers top regressions / improvements (10% off median, 3 samples), each citing the commit SHA + PR title of the latest run.
  4. Per-bench history <details> block with the full series.

What it found on current main

Running on the 14-day window already surfaces two latent issues the per-run CI summary has been silent about:

  • hot_path/get_file_comments_large has been ~34 over its 20 ms budget (~7083 ms) for at least 8 consecutive runs.
  • matching_bench panics with "yaml anchors/aliases not allowed in sidecars" on every run, short-circuiting cargo bench and dropping 4 of 5 budgeted benches from the output (matching/50_comments_1000_lines, fold_regions/large_100kb_jsonlike/default, parse_kql/pipeline_50_steps, strip_json_comments/large_100kb).

Both are pre-existing repo issues outside the scope of this PR, but the report makes them impossible to miss going forward.

Design

  • Budgets parsed live from .github/workflows/ci.yml single source of truth; if the workflow's BUDGETS table changes, the script follows.
  • 14-day default window matches the bench-output artifact retention.
  • --summary flag appends the report to \ so a future scheduled-workflow caller can post the same renderer's output to a job page without rewriting it.
  • Read-only never writes to the repo, never opens issues. Acting on findings is for the user or a follow-on skill (per user direction during design).

Verified

  • npm run lint:skills (11 skills now, all npm run references resolve)
  • npm run lint:markdown-surfaces
  • npx eslint scripts/perf-trend.mjs
  • node scripts/perf-trend.mjs --help
  • node scripts/perf-trend.mjs --days 14 --runs 10 renders full report with Missing-benches section
  • node scripts/perf-trend.mjs --summary with \ set writes 35 lines to the summary file

…n CI runs

Adds a new perf-trend skill (and supporting scripts/perf-trend.mjs + `npm run perf:trend`) that pulls the �ench-output artifacts from recent successful ci.yml runs on main, parses criterion bencher output, and renders a markdown trend report.

Why
---

The �ench job's per-run summary table (in the GitHub job summary) shows only the latest measurement vs budget. It tells you nothing about whether the value is moving, whether the regression is new, or whether other budgeted benches are silently missing from the output. This skill answers all three.

Output sections
---------------

1. **⚠ Missing budgeted benches**  flags any bench named in the workflow's declare -A BUDGETS block that produced no output in the window. Root cause is usually an earlier-running bench harness panicking and short-circuiting `cargo bench` for the rest of the suite. The existing CI summary is silent about this; surfacing it is the headline reason this skill exists.
2. **Summary**  per-bench row with latest, budget, window median, % delta vs median, status (` /  over budget), and an ASCII sparkline (left=oldest, right=newest).
3. **Movers**  top regressions / improvements (10% off median, 3 samples), each citing the commit SHA + PR title of the latest run.
4. **Per-bench history**  <details> block with the full (date, commit, ns/iter,  dev, title) series.

Validation against current main
-------------------------------

Running `npm run perf:trend` on the 14-day window surfaced two latent issues that the per-run CI summary has been silent about:

- `hot_path/get_file_comments_large` has been ~3-4 over its 20 ms budget for at least 8 consecutive runs (~70-83 ms).
- `matching_bench` panics with *"yaml anchors/aliases not allowed in sidecars"* on every run, which short-circuits `cargo bench` and drops 4 of 5 budgeted benches from the output: `matching/50_comments_1000_lines`, `fold_regions/large_100kb_jsonlike/default`, `parse_kql/pipeline_50_steps`, `strip_json_comments/large_100kb`.

Both are pre-existing repo issues  fixing them is out of scope for this PR, but the report makes them impossible to miss next time.

Design notes
------------

- Budgets parsed live from .github/workflows/ci.yml (single source of truth  updates to the workflow's BUDGETS table flow through automatically).
- 14-day window matches the bench-output artifact retention.
- --summary flag appends to `\` so a future scheduled-workflow caller can post the same report to a job page without rewriting the renderer.
- Read-only  never writes to the repo, never opens issues. Acting on findings is for the user or a follow-on skill.

Verified
--------

- `npm run lint:skills`  (11 skills now, all references resolve)
- `npm run lint:markdown-surfaces`
- `npx eslint scripts/perf-trend.mjs`
- `node scripts/perf-trend.mjs --help`
- `node scripts/perf-trend.mjs --days 14 --runs 10`   renders the full report
- `node scripts/perf-trend.mjs --summary` with `\` set   writes 35 lines to the summary file

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dryotta dryotta merged commit f8b6641 into main May 16, 2026
15 checks passed
@dryotta dryotta deleted the chore/perf-trend-skill branch May 16, 2026 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant