This repository demonstrates and benchmarks different compilation methods for translating high-level languages to RISCV64IM (the target architecture for RISCV zkVMs) using WASM-WASI as an intermediate representation.
This experiment measures the performance impact of using WASM-WASI as an intermediate step compared to direct compilation from high-level languages to RISCV64IM.
Note: While any language that compiles to WASM with WASI support (0.1) can use these pipelines, this project focuses primarily on Go and Rust.
All pipelines share a common first step: compiling high-level source code to WASM-WASI. Most modern language compilers support WASM as a target.
The transition from WASM to the zkVM target can be achieved through multiple approaches. This experiment explores three compilation methods:
- w2c2 + GCC: Transpile WASM to C source code using the
w2c2compiler, then compile the C code to the final target usinggccor a platform-specific compiler - WAMR (LLVM): Compile WASM directly to the final target using WAMR's LLVM backend
- wasmtime/wasmer (Cranelift): Compile WASM to Linux (either host or RISCV64) using
wasmtimeorwasmer, both of which use Cranelift for code generation
For the third approach, we targeted Linux because it's supported out of the box—porting to bare-metal would require significant additional effort. For benchmarking the Ethereum state transition function, this difference shouldn't significantly affect results due to minimal OS interaction and the absence of floating-point operations in the benchmark code.
graph TD;
source_code["Source Code<br/>Go, Rust, C, Zig, etc."]
wasm["WebAssembly<br/>with WASI (wasip1)"]
c_source_code["C Source Code"]
subgraph targets[" "]
zkvm_target_binary["Target Binary<br/>RISCV zkVM"]
linux_target_binary["Target Binary<br/>RISC-V/AMD64 Linux"]
end
source_code -->|"Language-specific<br/>WASM compiler"| wasm
wasm -->|"w2c2 transpiler"| c_source_code
c_source_code -->|"GCC/platform-specific<br/>compiler"| zkvm_target_binary
wasm -->|"WAMR<br/>llvm backend"| zkvm_target_binary
wasm -->|"wasmtime<br/>cranelift backend"| linux_target_binary
wasm -->|"wasmer<br/>cranelift backend"| linux_target_binary
classDef subgraphStyle fill:none,stroke:none;
The benchmark environment is dockerized and includes:
- RISC-V GNU Toolchain with newlib (rv64ima)
- w2c2 WebAssembly-to-C transpiler
- QEMU with
libinsnplugin - WAMR
- wasmtime
- wasmer
Note: The first time you run the Docker script, it will take some time as it rebuilds the RISC-V GNU toolchain from source.
Run the ./docker-shell.sh run_all_benchmarks_with_report.sh script to compare different compilation methods for the Ethereum state transition function. These scripts will:
- Compile Rust and Go implementations using various methods
- Execute the compiled binaries under QEMU with the
libinsnplugin to count instructions - Save instruction counts for each compilation method to
report.mdfile
See the scripts for implementation details.
The following benchmarks were performed:
- w2c2 -O0: WASM transpiled to C with
w2c2, then compiled with GCC using-O0optimization for Linuxrv64imad - w2c2 optimized: WASM transpiled to C with
w2c2, then compiled with GCC using higher optimization levels for Linuxrv64imad - directly:
- Rust:
cargo build --target riscv64gc-unknown-linux-gnu --release - Go:
GOOS=linux GOARCH=riscv64 go build
- Rust:
- wasmtime: WASM compiled with
wasmtimeusing Cranelift backend to ariscv64gcprecompiled ".cwasm" file, then executed using thewasmtimeruntime on Linux - wasmer (cranelift): WASM compiled with
wasmerusing Cranelift backend to ariscv64gcprecompiled ".wasmu" file, then executed using thewasmerruntime on Linux - wamr -O0: WASM compiled with
wamrusing LLVM backend with-O0optimization for bare-metalriscv64ima
The following critical benchmarks could not yet be performed due to issues in wasmer and wamr:
- wasmer (llvm): WASM compiled with
wasmerusing LLVM backend to ariscv64gcprecompiled ".wasmu" file, then executed using thewasmerruntime on Linux - wamr -O3: WASM compiled with
wamrusing LLVM backend with-O3optimization for bare-metalriscv64ima
Since these critical benchmarks could not be performed on RISC-V, they were performed on AArch64 with the expectation that those results would allow us to extrapolate potential RISC-V performance.
See the "Known Issues" section for details.
| Program | w2c2 -O0 |
w2c2 optimized |
wasmtime | wasmer (cranelift) |
wasmer (llvm) |
WAMR -O0 |
WAMR -O3 |
directly |
|---|---|---|---|---|---|---|---|---|
reva-client-eth (Rust) |
7,887,190,279 | 1,419,050,123 (-O1) |
1,074,488,397 | doesn't work | ? | didn't check | ? | 388,564,723 |
stateless (Go) |
12,866,052,519 | 2,118,257,727 (-O3) |
874,758,419 | 953,874,491 | ? | 5,427,433,654 | ? | 236,265,327 |
Important: The reva-client-eth and stateless numbers should not be compared directly against each other, as these implementations execute against different blocks using different block serialization frameworks.
Unfortunately, we were unable to benchmark the most promising approaches (wasmer (llvm) and wamr -O3) on RISCV due to outstanding issues. The following analysis is based on available results for RISCV only.
- Direct compilation is fastest: As expected, compiling directly to the target architecture provides the best performance
- Optimization level is critical for w2c2: Using GCC optimization flags provides a 6x speedup compared to unoptimized
-O0builds - Cranelift-based pipelines perform best: Among the WASM-based approaches, pipelines using Cranelift for code generation show the best performance
- Performance overhead of WASM intermediate step: The ratio of instructions required when compiling via
wasmtimeversus direct compilation is:- 2.8x for
reva-client-eth(Rust) - 3.7x for
stateless(Go)
- 2.8x for
- WASM quality comparison: The relatively similar overhead ratios suggest that Go's WASM compiler generates code quality comparable to Rust's WASM compiler
- WAMR -O0 performance: Currently falls between
w2c2andwasmtimein terms of instruction count
$ ls -lah build/bin/
827K fibonacci.riscv.O0.elf
686K fibonacci.riscv.O3.elf
823K hello_world.riscv.O0.elf
682K hello_world.riscv.O3.elf
23M reva-client-eth.riscv.O0.elf
19M reva-client-eth.riscv.O1.elf
74M stateless.amd64.O0.elf
28M stateless.amd64.O1.elf
29M stateless.amd64.O3.elf
67M stateless.riscv.O0.elf
58M stateless.riscv.O1.elf
64M stateless.riscv.O3.elf
The benchmark was performed for stateless (Go) only. WAMR -O3 targets dynamically-linked AArch64 Linux MUSL. To reduce the impact of dynamic linker overhead and WAMR runtime setup, multiple runs of the business logic were performed. In the following tables, all WAMR rows without a --b-c=0 annotation used the --bounds-checks=0 option during WAMR compilation. All other rows used the --bounds-checks=1 option during WAMR compilation.
| pipeline / number of runs |
wasmtime | wasmer (cranelift) |
wasmer (llvm) |
WAMR -O3 |
directy |
|---|---|---|---|---|---|
| 1x | 659,241,636 | 663,867,152 | 626,137,758 | 990,892,303 | 166,611,730 |
| 1x | (same) | (same) | (same) | 699,810,538 --b-c=0 |
(same) |
| 10x | 2,533,334,795 | 2,268,562,071 | 2,002,956,919 | 3,100,038,538 | 660,390,007 |
| 25x | 5,686,210,736 | 4,978,224,909 | 4,338,984,157 | 6,674,027,814 | 1,477,959,349 |
| 50x | 10,929,448,352 | - | - | 12,581,544,465 | 2,830,756,855 |
| 50x | (same) | - | - | 8,338,818,655 --b-c=0 |
(same) |
The following table presents the ratio between the number of steps executed for a given compilation pipeline and the number of steps executed for a directly compiled program.
| pipeline / number of runs |
wasmtime | wasmer (cranelift) |
wasmer (llvm) |
WAMR -O3 |
directy |
|---|---|---|---|---|---|
| 1x | 3.95 | 3.98 | 3.75 | 5.94 | 1.0 |
| 1x | (same) | (same) | (same) | 4.19 --b-c=0 |
(same) |
| 10x | 3.83 | 3.43 | 3.03 | 4.69 | 1.0 |
| 25x | 3.84 | 3.36 | 2.93 | 4.51 | 1.0 |
| 50x | 3.86 | - | - | 4.44 | 1.0 |
| 50x | (same) | - | - | 2.94 --b-c=0 |
(same) |
The --bounds-checks=0 option appears to be critical for WAMR performance. Only with this option can WAMR outperform Cranelift-based frameworks in some scenarios. For a single run, wasmtime, wasmer (cranelift), and wasmer (llvm) show similar performance. Single-run results for WAMR are difficult to interpret due to overhead from dynamic linking and Linux WAMR runtime setup, which would not be present on a zkVM bare-metal platform. For compute-intensive programs (50x), WAMR appears to have an edge over other compilation pipelines. The best-performing WebAssembly approaches appear to be 3-4 times slower than direct compilation of the stateless Go program.
These results have not been taken into account in the "Analysis" section.
Running WAMR with non-zero optimization levels on RISC-V currently fails with a relocation error. Issue: bytecodealliance/wasm-micro-runtime#4765
The wasmer team is actively working on fixing RISC-V target support. Issues:
The w2c2 optimized pipeline for reva-client-eth uses the -O1 optimization level. Higher optimization levels cause GCC to hang during compilation. This has been confirmed as a GCC bug based on the following observations:
- Clang successfully compiles the same sources
- When w2c2 is invoked with the
-f 100option (which splits output into many source files), GCC hangs while compiling a single ~1000 LOC file
For reference, reva-client-eth compiled with Clang using -O3 requires 1.2×10⁹ instructions to execute—not significantly fewer than when compiled with GCC using -O1 (1.4×10⁹ instructions).
The w2c2 optimized pipeline for the stateless program fails to link when using non-zero optimization levels, producing the error:
guest.c:(.text.guestInitMemories+0x50): relocation truncated to fit: R_RISCV_JAL against `.L214'
collect2: error: ld returned 1 exit status
The issue stems from a single massive function guestInitMemories spanning over 100,000 lines of C code generated by w2c2 for stateless. GCC emits R_RISCV_JAL relocation for intra-function branches, which support only ±1MB PC-relative jumps. GCC lacks a fallback mechanism to automatically use AUIPC+JALR for out-of-range intra-function jumps when optimization creates this problem.
Workaround: Use the -fno-reorder-blocks flag to disable the optimization that creates large jumps. With this flag, stateless can be built with -O3 optimization.
Note: This issue doesn't occur on x86 because that platform supports 32-bit relative jumps.
For higher optimization levels (e.g., -O3), expect compilation times of up to 60 minutes for reva-client-eth and stateless.
You can call platform-specific functions from your WASM code using custom imports.
In Go, use //go:wasmimport:
// examples/go/with_import/example.go
package main
import "fmt"
//go:wasmimport testmodule testfunc
//go:noescape
func testfunc(a, b uint32) uint32
func main() {
result := testfunc(1, 2)
fmt.Printf("testfunc(1, 2) = %d\n", result)
}Implement the import in platform/*/custom_imports.c:
// platform/amd64/custom_imports.c
U32 testmodule__testfunc(void* p, U32 a, U32 b) {
printf("testfunc called with %u, %u\n", a, b);
return a + b;
}Dotnet imports are more complex than for Go, already because the build artifacts are wrapped as wasip2 components. In addition there are multiple FFI mechanisms.
The currently implemented strategy revolves around unmanaged code marked as UnmanagedCallersOnly because managed code is wrapped in a binary blob within the w2c2 output. In order to call the unmanaged code from managed code a function pointer ("delegate") is used.
Alternatively a fairly popular project is componentize-dotnet. However it is still experimental and various not trivial to upgrade dependencies marked as alpha. The glue mechanism used by the project is from wit-bindgen. While it can easily map even complex interface structures, unfortunately subtleties such as unsigned flags (uint) get dropped during the process. It would still be useful to evaluate the underlying mechanism.
https://github.com/bytecodealliance/componentize-dotnet https://github.com/bytecodealliance/wit-bindgen
For embedded targets with limited memory, use debug.SetMemoryLimit():
import "runtime/debug"
func main() {
debug.SetMemoryLimit(400 * (1 << 20)) // 400MB limit
// ...
}MIT + Apache