Skip to content

chore(runtime): interpreter-throughput benchmark + baseline (measure-first for perf work)#358

Merged
avrabe merged 1 commit into
mainfrom
perf/interpreter-throughput-baseline
Jun 24, 2026
Merged

chore(runtime): interpreter-throughput benchmark + baseline (measure-first for perf work)#358
avrabe merged 1 commit into
mainfrom
perf/interpreter-throughput-baseline

Conversation

@avrabe

@avrabe avrabe commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Makes the interpreter's throughput a measured number so speed work can be gated — the prerequisite for every optimization after it (synth discipline: a regression is a number, not a vibe).

Why

The only execution bench (kiln-component/benches/execution_benchmarks.rs) was orphaned (not registered as a [[bench]]) and read a missing test_add.wasm from CWD — so perf was entirely unmeasured/ungated.

What

Self-contained criterion bench in kiln-runtime (modules built in-process via wat, no fixture):

  • compute_loop — arithmetic hot path (dispatch + operand stack + locals)
  • memory_loop — same loop + i64.store/i64.load per iter (the per-access lock path)

Baseline (release, Apple silicon)

Workload Throughput ns/iter
compute_loop ~14.3 M iter/s ~70
memory_loop ~5.9 M iter/s ~169

The ~99 ns/iter gap for two memory ops (only ~6 extra instructions) quantifies the double-mutex memory-access overhead — the optimization signal for the biggest win the assessment found.

Next (gated on this)

  1. CI regression gate (criterion --save-baseline + compare).
  2. The memory-access fix (cache the memory ref / MemoryAccessor borrow), measured against this baseline.

Trace: SM-PERF-001

🤖 Generated with Claude Code

…ure-first for perf work)

Makes the tree-walker's throughput a measured number so speed work can be
gated (synth discipline: a regression is a number). Self-contained criterion
bench (wat-built modules, no filesystem fixture — the old, orphaned
kiln-component/benches/execution_benchmarks.rs read a missing test_add.wasm):
- compute_loop: arithmetic hot path (dispatch + operand stack + locals)
- memory_loop:  same loop + i64.store/i64.load per iter (the per-access lock path)

Baseline (release, Apple silicon): compute ~14.3 M iter/s (~70 ns/iter);
memory ~5.9 M iter/s (~169 ns/iter). The ~99 ns/iter gap for 2 memory ops
quantifies the double-mutex memory-access overhead — the optimization signal.

Next: a CI regression gate (criterion --save-baseline + compare) and then the
memory-access fix, measured against this. Trace: SM-PERF-001
@github-actions

Copy link
Copy Markdown

🔍 Build Diagnostics Report

Summary

Metric Base Branch This PR Change
Errors 0 0 0
Warnings 5 5 0

🎯 Impact Analysis

Issues in Files You Modified

  • 0 new errors introduced by your changes
  • 0 new warnings introduced by your changes
  • 0 total errors in modified files
  • 0 total warnings in modified files
  • 0 files you modified

Cascading Issues (Your Changes Breaking Other Files)

  • 0 new errors in unchanged files
  • 0 new warnings in unchanged files
  • 0 unchanged files now affected

Note: "Cascading issues" are errors in files you didn't modify, caused by your changes (e.g., breaking API changes, dependency issues).

✅ No Issues Detected

Perfect! Your changes don't introduce any new errors or warnings, and don't break any existing code.


📊 Full diagnostic data available in workflow artifacts

🔧 To reproduce locally:

# Install cargo-kiln
cargo install --path cargo-kiln

# Analyze your changes
cargo-kiln build --output json --filter-severity error
cargo-kiln check --output json --filter-severity warning

@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@avrabe avrabe merged commit 0ebd32b into main Jun 24, 2026
18 checks passed
@avrabe avrabe deleted the perf/interpreter-throughput-baseline branch June 24, 2026 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant