High-performance Pi computation, verification, and benchmarking in Rust — inspired by y-cruncher.
pi_crunch computes π with the Chudnovsky algorithm evaluated by binary
splitting (parallelised with rayon),
independently verifies the result with Machin's formula, and writes a
benchmark report comparing your machine to y-cruncher's published numbers.
It is fully self-contained: after git clone, the default build needs only
cargo build --release — no C libraries, no system dependencies.
- Compute π to a fixed digit count, or run a timed budget that doubles the digit count each round until either the time runs out or a digit ceiling is reached — whichever comes first (default: 30s or 1.2M digits).
- Verify the digits independently via
π/4 = 4·arctan(1/5) − arctan(1/239)in exact scaled-integer arithmetic — a completely different algorithm from the one that produced the digits, so agreement is strong evidence of correctness. - Benchmark — collects CPU/RAM/OS info and emits a per-round results table plus y-cruncher reference times.
Three files are written to the output directory:
| File | Contents |
|---|---|
pi_digits.txt |
The digits — 100 per line (10 groups of 10), with a counter |
pi_verification.txt |
Per-block PASS/FAIL report and overall verdict |
pi_benchmark.txt |
System info, round-results table, references |
Progress is printed to stderr, so stdout stays clean and the digit output remains redirectable.
git clone https://github.com/iampram/pi_crunch
cd pi_crunch
cargo build --release
./target/release/pi_crunch # timed run: 30s or 1.2M digits, whichever first (default)On Windows, the binary is
./target/release/pi_crunch.exe.
You need a Rust toolchain with a working linker. Install Rust via rustup. The big-integer dependencies are pure Rust, so the default build needs no C compiler — but every Rust program still needs a linker to produce an executable.
- Rust via rustup.
- A linker/C toolchain (usually already present):
sudo apt install build-essential(Debian/Ubuntu) or the equivalentgcc/binutilspackage. - For
--features gmp:sudo apt install libgmp-dev.
- Rust via rustup.
- Xcode Command Line Tools (provides the linker):
xcode-select --install. - For
--features gmp:brew install gmp.
Pick one of the following so the toolchain can link:
-
MSVC (default Rust on Windows): install the Build Tools for Visual Studio with the "Desktop development with C++" workload (this provides
link.exeand the Windows SDK). VS Code alone is not sufficient. -
GNU (no Visual Studio needed): install the self-contained GNU toolchain and use it for this project only — this leaves your global default (and other projects) untouched:
rustup toolchain install stable-x86_64-pc-windows-gnu # from the pi_crunch directory: rustup override set stable-x86_64-pc-windows-gnu
After this, plain
cargo build/cargo runwork here with no+toolchainprefix. To make GNU your machine-wide default instead, userustup default stable-x86_64-pc-windows-gnu.The default build links fully self-contained on this toolchain — no MSYS2 or MinGW install required. (The dependencies deliberately avoid
raw-dylibcrates, which would otherwise need MinGW'sdlltool/as.)Heads-up: if you build from a Unix-style shell (Git Bash / MSYS2) on the MSVC toolchain without the VS Build Tools installed, the linker error will mention
link: extra operand/Try 'link --help'. That is the GNU coreutilslink(a hard-link utility) shadowing MSVC'slink.exeonPATH— the real fix is to install the VS C++ Build Tools or switch to the GNU toolchain as above, not to touchPATH. -
For
--features gmpon Windows: install MSYS2, then in the MSYS2 shellpacman -S mingw-w64-x86_64-gmp, and point the build at it:$env:GMP_LIB_DIR = "C:\msys64\mingw64\lib" cargo build --release --features gmp
cargo build --releaseWhen enabled, the big-integer backend switches from num-bigint to
rug (GMP), typically ~3–5× faster.
# Linux: sudo apt install libgmp-dev
# macOS: brew install gmp
# Windows: MSYS2 + pacman -S mingw-w64-x86_64-gmp, then set GMP_LIB_DIR
cargo build --release --features gmpThe program creates and frees a lot of big numbers while it works, and the
default system memory manager can become a bottleneck when many threads do this
at once. The fast-alloc feature swaps in mimalloc,
a memory manager built for exactly this kind of multi-threaded churn — usually a
small (single-digit-percent) speedup.
cargo build --release --features fast-allocNeeds a C compiler. mimalloc is written in C, so this feature only builds if a C compiler is on your
PATH(the MSVC Build Tools on Windows, or MSYS2'smingw-w64-gcc;gcc/clangon Linux/macOS). It's off by default so the standard build stays dependency-free. If you don't have a C compiler, just leave it off — the program runs fine without it.
All flags (defaults shown):
pi_crunch [OPTIONS]
--mode <timed|fixed> Run mode [default: timed]
--seconds <N> Wall-clock budget (timed mode) [default: 30]
--max-digits <N> Digit ceiling (timed mode) [default: 1200000]
--digits <N> Exact digit count (fixed mode) [required if fixed]
--threads <N> Worker threads [default: all logical cores]
--out-dir <PATH> Output directory [default: .]
--skip-verify Skip verification (benchmark only)
--verify-blocks <N> Digits per verification block [default: 1000]
--verify-max-digits <N> Cap on digits to verify (see note) [default: 1000000]
-h, --help Print help
-V, --version Print version
Run examples (use .\target\release\pi_crunch.exe on Windows):
# Default timed run: stops at 30s or 1.2M digits, whichever comes first
./target/release/pi_crunch
# Timed run, 30-second budget
./target/release/pi_crunch --mode timed --seconds 30
# Timed run, but stop at 500k digits even if the budget allows more
./target/release/pi_crunch --mode timed --seconds 90 --max-digits 500000
# Compute exactly 1,000 digits (quick sanity check)
./target/release/pi_crunch --mode fixed --digits 1000
# Compute exactly 1 million digits
./target/release/pi_crunch --mode fixed --digits 1000000
# Pin to 4 threads
./target/release/pi_crunch --mode fixed --digits 1000000 --threads 4
# Write the three reports to /tmp
./target/release/pi_crunch --seconds 30 --out-dir /tmp
# Pure compute benchmark — no verification
./target/release/pi_crunch --skip-verify
# Smaller verification blocks (more granular report)
./target/release/pi_crunch --mode fixed --digits 50000 --verify-blocks 250
# Verify fewer digits to keep very large runs fast (final division still costs)
./target/release/pi_crunch --mode fixed --digits 5000000 --verify-max-digits 100000
# Verify more digits (slower) on a smaller run
./target/release/pi_crunch --mode fixed --digits 2000000 --verify-max-digits 2000000
# Everything at once
./target/release/pi_crunch --mode timed --seconds 45 --threads 8 \
--out-dir ./out --verify-blocks 500 --verify-max-digits 250000Machin's formula is evaluated by binary splitting (the same technique as the
Chudnovsky path): the arctan series is summed via a divide-and-conquer tree of
large multiplications — sub-quadratic, O(M(n)·log n) with num-bigint's
Karatsuba/Toom multiplication — rather than the naive term-by-term sum (which
divided an n-digit number per term and was genuinely O(n²)). The remaining
cost is dominated by a single final scaled division, which num-bigint performs
in O(n²); that constant is small, so verifying ~512k digits is a few seconds and
~1,000,000 is well under a minute. To keep very large runs responsive,
verification is still capped at min(computed_digits, --verify-max-digits)
(default 1,000,000).
- Small and mid-size runs verify fully in well under a second.
- Lower the cap (e.g.
--verify-max-digits 100000) to trim the final-division cost on multi-million-digit runs. - Raise it (e.g.
--verify-max-digits 5000000) to verify deeper, at the cost of time. Use--skip-verifyto skip verification entirely.
When the cap truncates coverage, both the stderr log and pi_verification.txt
note it, e.g. "verified first 1,000,000 of 5,000,000 digits".
Pi Computation Result
=====================
Timestamp : 2026-06-09T14:23:01Z
Digits : 1000000
Algorithm : Chudnovsky (binary splitting)
Compute : 4213 ms
Threads : 16
Backend : pure-rust
3.
1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 : 100
8214808651 3282306647 0938446095 5058223172 5359408128 4811174502 8410270193 8521105559 6446229489 5493038196 : 200
...
Pi Verification Report
======================
Timestamp : 2026-06-09T14:23:07Z
Digits verified : 1000000
Method : Machin formula (integer arithmetic)
Duration : 6841 ms
Result : PASS
Block 1 [digits 1–1000] : PASS
Block 2 [digits 1001–2000] : PASS
...
════════════════════════════════════════
Verification: PASSED 1000000 digits
════════════════════════════════════════
A failing block reads:
Block N [digits …] : FAIL — first mismatch at digit position XXXX,
and the final verdict becomes FAILED.
System header, a box-drawing round-results table (Digits, Compute (ms),
Digits/sec, Est. RAM MB), summary, and the y-cruncher reference numbers.
RAM estimate. The
Est. RAM MBcolumn usesdigits × 3.32193 × 1.5 × 3bytes —log2(10)bits/digit, ×1.5 for division scratch, ×3 for the three big-integer trees (P, Q, R). It is an estimate, not a measured peak; actual usage depends on the backend and allocator.
y-cruncher reference times (published at numberworld.org/y-cruncher):
| Digits | Fastest | Slowest |
|---|---|---|
| 25M | 0.21s (Ryzen AI Max+ 395) | 1.09s (Ryzen 7 1800X) |
| 100M | 0.76s (Ryzen AI Max+ 395) | 9.27s (Core i3 8121U) |
| 1B | 10.0s (Ryzen AI Max+ 395) | 142s (Core i3 8121U) |
y-cruncher uses GMP + AVX-512 + hand-tuned assembly. The pure-Rust backend
here will be slower; build with --features gmp for a closer comparison.
The heavy lifting is the Chudnovsky series: a long sum of terms that all get
combined into one fraction. Instead of adding the terms one-by-one, the program
splits the sum down the middle, then splits each half again, and again — like a
tournament bracket — until each piece is tiny. The tiny pieces are handed out to
all your CPU cores at once, and the results are combined back up the bracket.
Combining two pieces means multiplying very large numbers, and at the top of the
bracket those multiplications are themselves shared across cores. (--threads
lets you cap how many cores it uses; the default is all of them.)
There's one catch worth knowing, because it explains the timings you'll see:
The final step doesn't split. After the bracket is combined, recovering the actual digits of π needs one giant long-division (and a square root). That step is inherently sequential — it can't be spread across cores — and its cost grows with the square of the digit count. For runs into the hundreds of thousands of digits and beyond, this single step dominates the total time, so adding more cores helps less than you might expect. This is the main reason the GMP backend (
--features gmp) is much faster: it has a smarter, faster division built in. Thefast-allocfeature shaves a little more off by reducing memory-management overhead, but doesn't change this fundamental limit.
cargo test # default pure-Rust backend
cargo test --features gmp # GMP backend (requires libgmp)The suite includes:
chudnovsky: the first 100 digits exactly match the known value; 1000 digits are cross-checked against the independent Machin verifier.verify: Machin verification passes on correct input and pinpoints the first mismatch on corrupted input; the reference reproduces the known 100 digits.output: line formatting (groups, spacing, right-aligned counter).benchmark: thousands formatting, RAM estimate, table rendering.
cargo clippy -- -D warnings
cargo fmt
cargo fmt --check- A linker is required to build. The pure-Rust default needs no C libraries
but, like any Rust binary, still needs a linker. On Windows that means either
the MSVC Build Tools (
link.exe) or the GNU toolchain — see Prerequisites. Alink: extra operanderror means a Unix-shelllinkis shadowing MSVC'slink.exe; the fix is the toolchain, notPATH(see the heads-up there). - Verification uses binary splitting (sub-quadratic), but its final scaled
division is still O(n²) in num-bigint, and it is capped by
--verify-max-digits(default 1M). It is not meant to verify hundreds of millions of digits; use--skip-verifyfor pure compute benchmarks at that scale. - Pure-Rust is slower than y-cruncher. No SIMD/assembly; the
gmpfeature narrows the gap but still won't match y-cruncher's hand-tuned kernels. - RAM figures are estimates, not measured peaks (see the note above).
- GMP on Windows requires MSYS2 and a correctly set
GMP_LIB_DIR; it is the most involved build path. - The last computed digit may occasionally differ from a rounded reference by one ULP; guard digits make the reported digits stable, and verification covers the rest.
MIT. See Cargo.toml.