Buddy-MLIR Gemmini performance benchmarks: kernel + ResNet50 validation by ashvin-verma · Pull Request #11 · ucb-bar/merlin

ashvin-verma · 2026-02-09T05:50:10Z

Summary

Add reproducible Buddy-MLIR performance benchmarks on Gemmini (Spike simulator)
Covers matmul workloads (MLP2, MLP1, softmax, iGELU), conv workloads (conv, conv+pool), and ResNet50 conv1 layer validation
All output checksums validated against Gemmini C reference (tiled_matmul_auto / tiled_conv_auto)
Documents the full lowering pipeline from Gemmini MLIR to bare-metal execution in WORKFLOW.md

Conv encoding bug fix

We found and fixed a bug in Buddy-MLIR's Gemmini conv lowering: the im2col encoding was producing incorrect weight matrix layouts, causing checksum mismatches against the Gemmini C reference. Fix contributed upstream as buddy-compiler/buddy-mlir#689. All conv benchmarks here require this fix. The conv1-bad-buddy test case (intentionally wrong stride) is included to verify the validation methodology catches such errors.

Key Results

Matmul Workloads

Workload	Dataflow	Gemmini C cycles	Buddy cycles	Checksum	Speedup
MLP2 (64×832)	WS	2,528	409	✓ 252338	6.18×
MLP2 (64×832)	OS	207,782	96,076	✓ 252338	2.16×
MLP1 (6-layer)	WS	25,251	2,539	✓ 258664	9.95×
softmax matmul (31×30×66)	WS	335	145	✓ 3860	2.31×
iGELU matmul (30×30×30)	WS	133	133	✓ −23260	1.00×

Conv Workloads (post conv-encoding fix)

Workload	CPU cycles	Gemmini C cycles	Buddy cycles	Checksum	Buddy vs Gemmini C
conv (17×17, k=3, stride=2)	7,559,913	1,027	149	✓ 950	6.89×
conv+pool (17×17, k=3, pool=3)	7,714,291	1,605	172	✓ 30827	9.33×

ResNet50 Conv1 Layer

Layer	Gemmini C cycles	Buddy cycles	Checksum	Speedup
Conv1 (7×7, stride=2, 3×3 pool)	225,146	7,313	✓ 10206332	30.8×

Note on cycle counts: rdcycle measures CPU instructions — Buddy's compile-time loop unrolling reduces host-side orchestration overhead, so speedups reflect less host-side work, not necessarily faster Gemmini hardware throughput.

What's included

experiments/buddy-benchmarks/kernels/ — 7 kernel benchmarks (.mlir + .c harnesses) with Makefile
experiments/buddy-benchmarks/resnet50/ — ResNet50 conv1 validation (Buddy vs Gemmini C + intentional bad case)
experiments/buddy-benchmarks/scripts/run_benchmark.sh — Single script to build and run everything
experiments/buddy-benchmarks/logs/ — Reference Spike output logs
experiments/buddy-benchmarks/README.md — Full results, methodology, reproduction instructions
experiments/buddy-benchmarks/WORKFLOW.md — Complete lowering pipeline documentation (MLIR → buddy-opt → buddy-translate → buddy-llc → gcc link → Spike)

Test plan

Build kernel benchmarks: make -C experiments/buddy-benchmarks/kernels all
Run on Spike and verify checksums match reference logs
Run experiments/buddy-benchmarks/scripts/run_benchmark.sh end-to-end
Build ResNet50 validation: make -C experiments/buddy-benchmarks/resnet50 validate

Add reproducible benchmarks comparing Buddy-MLIR's Gemmini dialect backend against the Gemmini C reference on Spike simulator. Kernel benchmarks: conv, conv+pool, MLP2 (WS/OS), MLP1, softmax matmul, iGELU matmul. ResNet50 conv1 layer validation with intentional bad case for test methodology verification. Conv benchmarks require buddy-compiler/buddy-mlir#689 (conv encoding fix) for correct im2col lowering.

Step-by-step guide from Gemmini dialect MLIR through buddy-opt, buddy-translate, buddy-llc, bare-metal linking, to Spike execution. Includes setup instructions for all prerequisites.

ashvin-verma added 2 commits February 8, 2026 21:46

[add] WORKFLOW.md documenting full lowering and execution pipeline

01f28db

Step-by-step guide from Gemmini dialect MLIR through buddy-opt, buddy-translate, buddy-llc, bare-metal linking, to Spike execution. Includes setup instructions for all prerequisites.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Buddy-MLIR Gemmini performance benchmarks: kernel + ResNet50 validation#11

Buddy-MLIR Gemmini performance benchmarks: kernel + ResNet50 validation#11
ashvin-verma wants to merge 2 commits intomainfrom
ashvin/buddy-gemmini-benchmarks

ashvin-verma commented Feb 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ashvin-verma commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Conv encoding bug fix

Key Results

Matmul Workloads

Conv Workloads (post conv-encoding fix)

ResNet50 Conv1 Layer

What's included

Related

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ashvin-verma commented Feb 9, 2026 •

edited

Loading