diff --git a/bench/README.md b/bench/README.md index f4b449c..38039ce 100644 --- a/bench/README.md +++ b/bench/README.md @@ -80,8 +80,13 @@ filter keeps the run from also executing the other (`#[ignore]`'d) tests. Each test writes `bench/results/.csv` with columns `size,time_s,peak_bytes,n_objects`. `plot_scaling.py` needs only the standard library to print the summary table; `matplotlib` is required to draw the figure. -The fast `stress_*_smoke` tests (which just check that each example still -compiles) run in the normal `cargo test` suite and are **not** ignored. +The figure is drawn for the ACM `acmart` `sigconf` camera-ready format: it is +sized to the full two-column text width (~7 in, i.e. a `figure*`), uses a +colorblind-safe palette with distinct markers (legible in grayscale), and +embeds its fonts as Type 42 — never Type 3, which ACM's TAPS pipeline rejects — +so the PDF drops into the paper at `width=\textwidth` with no font-shrinking +rescale. The fast `stress_*_smoke` tests (which just check that each example +still compiles) run in the normal `cargo test` suite and are **not** ignored. ### Configuring the sweeps @@ -96,8 +101,8 @@ how an axis scales — without editing any source. Pass a comma-separated list: | `ARGON_BENCH_SHAPES_LOOP` | shapes (`for` loop) | `500,1000,2000,4000,8000,16000,32000` | | `ARGON_BENCH_INSTANCES` | instances | `500,…,64000` | | `ARGON_BENCH_CONSTRAINTS` | coupled constraints | `32,64,128,256,512,1024,2048,4096,8192,16384` | -| `ARGON_BENCH_HIER_SINGLE` | hierarchy (1 ref) | `4,8,16,32,48,64,96,128` | -| `ARGON_BENCH_HIER_DOUBLE` | hierarchy (2 refs) | `4,8,16,32,48,64,96,128` | +| `ARGON_BENCH_HIER_SINGLE` | hierarchy (1 ref) | `4,8,16,32,64,128,256,512,1024,2048` | +| `ARGON_BENCH_HIER_DOUBLE` | hierarchy (2 refs) | `4,8,16,32,64,128,256,512,1024,2048` | ```bash # e.g. sweep the for-loop variant out to the same sizes as bench_shapes @@ -128,12 +133,12 @@ parameter; "peak" is peak heap allocated during compilation. | Axis | largest `n` | time @ largest | peak mem @ largest | empirical scaling | | ---- | ----------- | -------------- | ------------------ | ----------------- | -| Shapes (recursion) | 32 000 rects | 1.52 s | 0.89 GiB | **~linear** (time `∝ n^1.2`, mem `∝ n^1.0`) | +| Shapes (recursion) | 32 000 rects | 1.54 s | 0.89 GiB | **~linear** (time `∝ n^1.2`, mem `∝ n^1.0`) | | Instances | 64 000 insts | 3.08 s | 1.26 GiB | **~linear** (time `∝ n^1.2`, mem `∝ n^1.0`) | -| Hierarchy, 1 child ref | depth 128 | 0.005 s | 11 MiB | **linear** in depth | -| Coupled constraints | 16 384 rects | 1.76 s | 0.59 GiB | **~linear** (time `∝ n^1.04`, mem `∝ n^0.90`) | +| Hierarchy, 1 child ref | depth 2048 | 0.15 s | 160 MiB | **linear** in depth | +| Coupled constraints | 16 384 rects | 1.37 s | 0.59 GiB | **~linear** (time `∝ n^1.0`, mem `∝ n^0.90`) | | Shapes (`for`-loop) | 32 000 rects | 1.06 s | 0.85 GiB | **~linear** (time `∝ n^1.2`, mem `∝ n^1.0`) | -| Hierarchy, 2 child refs | depth 128 | 0.006 s | 11 MiB | **linear** in depth (was exponential before the shared-type fix) | +| Hierarchy, 2 child refs | depth 2048 | 0.14 s | 163 MiB | **linear** in depth (was exponential before the shared-type fix) | ### Interpretation @@ -157,8 +162,8 @@ parameter; "peak" is peak heap allocated during compilation. the dense SVD never runs. The general dense solver is retained only as a fallback for an *irreducible* coupled core (a block with no ≤2-variable pivot). This axis used to be **super-cubic** — `~22 s` at `n=1024`, steepening - toward the `O(n^3)` of dense factorization — and is now `~n^1.04` in time - (`1.76 s` at `n=16384`, 16× larger, in less memory than the old `n=1024` + toward the `O(n^3)` of dense factorization — and is now `~n^1.0` in time + (`1.37 s` at `n=16384`, 16× larger, in less memory than the old `n=1024` dense matrix used). The "general linear constraint solving (slow)" caveat in the top-level README now bites only for genuinely dense coupled blocks, not for the common sparse-but-coupled case. @@ -170,13 +175,15 @@ parameter; "peak" is peak heap allocated during compilation. tree: whether a cell references its child **once** (`let i = inst(child());`) or **twice** (the `let c = child(); let i = inst(c);` idiom from the tutorial), `h{k}` holds shared pointers to the single type of `h{k-1}`, and both variants - cost the same — linear in depth (≈11 MiB / 6 ms at depth 128, the two series - within ~15% of each other). Before this fix the type was deep-copied per + cost the same — linear in depth (≈160 MiB / 0.15 s at depth 2048, the two + series within ~2% of each other). Before this fix the type was deep-copied per reference, so the single-ref chain was quadratic (`~depth^1.4`) and the double-ref chain **doubled with every level** (`×1.9` measured), exhausting memory beyond ~depth 20 (depth 18 alone took ~3.6 GiB / 11.5 s). The remaining - hierarchy limit is unrelated to the type representation: very deep chains hit a - native-recursion stack limit in the compiler at a few hundred levels. + hierarchy limit is unrelated to the type representation: the compiler walks the + hierarchy by native recursion, so depth is bounded by the stack — the benchmark + runs this axis on a 512 MiB stack (reaching a few thousand levels), and lifting + the cap entirely would mean turning that recursion into an explicit work-stack. - **Recursion and iteration now scale identically.** `shapes` and `shapes_loop` emit identical geometry; the only difference is that `shapes_loop` builds and @@ -187,7 +194,7 @@ parameter; "peak" is peak heap allocated during compilation. persistent vector and lowering `range` to a native builtin made `cons` O(log n) and `range` O(n); the two series now coincide, both linear in time and memory out to 32 000 rectangles (`shapes_loop`: 1.06 s / 0.85 GiB; - `shapes`: 1.52 s / 0.89 GiB). The idiomatic `for i in std::range(n)` loop is + `shapes`: 1.54 s / 0.89 GiB). The idiomatic `for i in std::range(n)` loop is no longer a scaling hazard. The gap between the two series is now a small constant — `shapes_loop` is even marginally faster, as the native `range` avoids the per-element recursion overhead of `emit_shapes`. diff --git a/bench/argon_scaling.pdf b/bench/argon_scaling.pdf index 6b49e33..733d73f 100644 Binary files a/bench/argon_scaling.pdf and b/bench/argon_scaling.pdf differ diff --git a/bench/argon_scaling.png b/bench/argon_scaling.png index 7501131..2b6c620 100644 Binary files a/bench/argon_scaling.png and b/bench/argon_scaling.png differ diff --git a/bench/plot_scaling.py b/bench/plot_scaling.py index 91c9a19..3144a09 100644 --- a/bench/plot_scaling.py +++ b/bench/plot_scaling.py @@ -26,13 +26,61 @@ # exponentially; on the current build every axis is sub-exponential. SERIES = [ ("shapes", "Shapes (recursion)", "# rectangles", "poly"), - ("shapes_loop", "Shapes (for-loop / cons list)", "# rectangles", "poly"), + ("shapes_loop", "Shapes (for-loop)", "# rectangles", "poly"), ("instances", "Instances", "# instances", "poly"), ("constraints", "Coupled constraints", "# coupled rects", "poly"), - ("hierarchy_single_ref", "Hierarchy (1 child ref)", "depth", "poly"), - ("hierarchy_double_ref", "Hierarchy (2 child refs)", "depth", "poly"), + ("hierarchy_single_ref", "Hierarchy (1 ref)", "depth", "poly"), + ("hierarchy_double_ref", "Hierarchy (2 refs)", "depth", "poly"), ] +# Okabe-Ito colorblind-safe palette (black and yellow dropped for line +# contrast on white), ordered to match SERIES. The two "twin" pairs -- +# recursion/for-loop and 1-ref/2-ref -- get distinct hues so that their +# near-coincidence on the plot reads as two curves landing on top of each +# other rather than one. Markers are also distinct so the series remain +# separable in grayscale print. +PALETTE = ["#0072B2", "#56B4E9", "#009E73", "#D55E00", "#CC79A7", "#E69F00"] +MARKERS = ["o", "s", "^", "D", "v", "P"] + + +def apply_pub_style(matplotlib): + """ACM acmart (sigconf, camera-ready) publication style. + + The critical setting is ``fonttype = 42``: it embeds text as subsetted + TrueType (Type 42) glyphs instead of matplotlib's default Type 3 fonts, + which ACM's TAPS pipeline (and IEEE PDF eXpress) reject. The figure uses a + sans-serif (Arial) face; ``Arial`` is listed first for portability, then the + metric-identical open clones (Liberation Sans / Arimo), then ``Nimbus Sans`` + (a Helvetica/Arial-metric clone -- the fallback present on this machine when + Arial proper is not installed). Every text element is >= 8 pt, so the figure, + dropped into a full-width ``figure*`` at the sigconf text width (~7 in), + renders all text at 8-9 pt without any rescaling. + """ + matplotlib.rcParams.update({ + "pdf.fonttype": 42, + "ps.fonttype": 42, + "font.family": "sans-serif", + "font.sans-serif": ["Arial", "Liberation Sans", "Arimo", + "Nimbus Sans", "Helvetica", "DejaVu Sans"], + "mathtext.fontset": "dejavusans", + "font.size": 8, + "axes.titlesize": 9, + "axes.labelsize": 8.5, + "xtick.labelsize": 8, + "ytick.labelsize": 8, + "legend.fontsize": 8, + "axes.linewidth": 0.7, + "lines.linewidth": 1.3, + "lines.markersize": 4.2, + "grid.linewidth": 0.5, + "xtick.major.width": 0.7, + "ytick.major.width": 0.7, + "xtick.minor.width": 0.5, + "ytick.minor.width": 0.5, + "savefig.dpi": 300, + "figure.dpi": 300, + }) + def load(path): xs, ts, ms = [], [], [] @@ -117,39 +165,47 @@ def main(): import matplotlib matplotlib.use("Agg") + apply_pub_style(matplotlib) import matplotlib.pyplot as plt except ImportError: sys.exit("\nmatplotlib not installed; printed summary only. `pip install matplotlib` to draw.") - fig, (ax_t, ax_m) = plt.subplots(1, 2, figsize=(13, 5.2)) - markers = ["o", "s", "^", "D", "v", "P"] - for (key, _, _, _), marker in zip(SERIES, markers): + # Full text width of an ACM acmart sigconf two-column figure* (~7 in). + fig, (ax_t, ax_m) = plt.subplots(1, 2, figsize=(7.0, 2.9), layout="constrained") + for (key, _, _, _), marker, color in zip(SERIES, MARKERS, PALETTE): if key not in data: continue label, unit, model, xs, ts, ms = data[key] - t_suffix, _ = describe(model, xs, ts) - m_suffix, _ = describe(model, xs, ms) - ax_t.plot(xs, ts, marker=marker, label=f"{label} ({t_suffix})") - ax_m.plot(xs, [m / 2**20 for m in ms], marker=marker, - label=f"{label} ({m_suffix})") + style = dict(marker=marker, color=color, + markeredgecolor="white", markeredgewidth=0.5) + ax_t.plot(xs, ts, label=label, **style) + ax_m.plot(xs, [m / 2**20 for m in ms], label=label, **style) for ax in (ax_t, ax_m): ax.set_xscale("log") ax.set_yscale("log") - ax.set_xlabel("problem size $n$ (rectangles / constraints / instances / depth)") - ax.grid(True, which="both", ls=":", alpha=0.4) + ax.set_xlabel("problem size n") + ax.grid(True, which="major", ls=":", alpha=0.45) + ax.grid(True, which="minor", ls=":", alpha=0.2) + ax.tick_params(which="both", direction="in", top=True, right=True) ax_t.set_ylabel("compile time (s)") - ax_t.set_title("Argon compile-time scaling") + ax_t.set_title("(a) Compile time") ax_m.set_ylabel("peak heap allocated (MiB)") - ax_m.set_title("Argon memory scaling") - ax_t.legend(fontsize=8, loc="upper left") - ax_m.legend(fontsize=8, loc="upper left") - fig.tight_layout() - + ax_m.set_title("(b) Peak heap memory") + leg_kw = dict(loc="upper left", handlelength=1.6, labelspacing=0.3, + borderpad=0.4, handletextpad=0.5, framealpha=0.9, + edgecolor="0.7", fancybox=False) + ax_t.legend(**leg_kw) + ax_m.legend(**leg_kw) + + # Saved at the figure's native size (constrained layout already reserves + # room for labels/legend), so the PDF MediaBox stays at the sigconf text + # width and the figure drops into the paper at width=\textwidth with no + # font-shrinking rescale. for ext in ("png", "pdf"): out = f"{args.out}.{ext}" - fig.savefig(out, dpi=150, bbox_inches="tight") + fig.savefig(out) print(f"wrote {out}") diff --git a/bench/results/constraints.csv b/bench/results/constraints.csv index d44fc70..988829a 100644 --- a/bench/results/constraints.csv +++ b/bench/results/constraints.csv @@ -1,11 +1,11 @@ size,time_s,peak_bytes,n_objects -32,0.003404379,2558933,33 -64,0.004213333,3802613,65 -128,0.007537572,6289973,129 -256,0.014580103,11264693,257 -512,0.02941705,21214133,513 -1024,0.063108165,41113013,1025 -2048,0.137085263,80910773,2049 -4096,0.306775942,160506293,4097 -8192,0.819698386,319697317,8193 -16384,1.762328533,638079381,16385 +32,0.003732583,2558722,33 +64,0.00413684,3802338,65 +128,0.00750728,6289570,129 +256,0.014379716,11264034,257 +512,0.035677509,21212962,513 +1024,0.066225588,41110818,1025 +2048,0.140565116,80906530,2049 +4096,0.295335076,160497938,4097 +8192,0.676296905,319680802,8193 +16384,1.373314977,638046466,16385 diff --git a/bench/results/hierarchy_double_ref.csv b/bench/results/hierarchy_double_ref.csv index c6b50c0..b4c084d 100644 --- a/bench/results/hierarchy_double_ref.csv +++ b/bench/results/hierarchy_double_ref.csv @@ -1,9 +1,11 @@ size,time_s,peak_bytes,n_objects -4,0.000954884,1468649,9 -8,0.001127072,1737159,17 -16,0.00144585,2399391,33 -32,0.00209207,3721935,65 -48,0.002703978,5197287,97 -64,0.003269988,6368992,129 -96,0.004542949,9284271,193 -128,0.005712034,11664148,257 +4,0.000902074,1469041,9 +8,0.000923206,1737943,17 +16,0.001169244,2400959,33 +32,0.001765389,3725071,65 +64,0.002942927,6375264,129 +128,0.005334938,11676692,257 +256,0.01091318,22279032,513 +512,0.024664048,43483256,1025 +1024,0.056468784,85891825,2049 +2048,0.143552596,170708721,4097 diff --git a/bench/results/hierarchy_single_ref.csv b/bench/results/hierarchy_single_ref.csv index 0ebf503..a1bbacf 100644 --- a/bench/results/hierarchy_single_ref.csv +++ b/bench/results/hierarchy_single_ref.csv @@ -1,9 +1,11 @@ size,time_s,peak_bytes,n_objects -4,0.000731158,1457539,9 -8,0.000869485,1726044,17 -16,0.001138329,2377156,33 -32,0.001715575,3677460,65 -48,0.002269978,5100553,97 -64,0.002783338,6280001,129 -96,0.003942497,9087143,193 -128,0.005035909,11486197,257 +4,0.000732499,1457931,9 +8,0.000825389,1726828,17 +16,0.001202294,2378724,33 +32,0.001827103,3680596,65 +64,0.003106322,6286273,129 +128,0.005698433,11498741,257 +256,0.012153742,21923197,513 +512,0.028767297,42771581,1025 +1024,0.064693487,84468470,2049 +2048,0.146190335,167862006,4097 diff --git a/bench/results/instances.csv b/bench/results/instances.csv index 6851eef..bc11643 100644 --- a/bench/results/instances.csv +++ b/bench/results/instances.csv @@ -1,9 +1,9 @@ size,time_s,peak_bytes,n_objects -500,0.011955936,11779220,501 -1000,0.024739344,22363452,1001 -2000,0.055428518,43531916,2001 -4000,0.141616584,85868844,4001 -8000,0.306157668,170542684,8001 -16000,0.674421326,339890396,16001 -32000,1.439115681,678585772,32001 -64000,3.081957272,1355976556,64001 +500,0.011928107,11779388,501 +1000,0.024442593,22363620,1001 +2000,0.054984212,43532084,2001 +4000,0.14333945,85869012,4001 +8000,0.302164048,170542852,8001 +16000,0.673339455,339890580,16001 +32000,1.427961141,678585956,32001 +64000,3.077511752,1355976740,64001 diff --git a/bench/results/shapes.csv b/bench/results/shapes.csv index 42ad9bd..e413857 100644 --- a/bench/results/shapes.csv +++ b/bench/results/shapes.csv @@ -1,8 +1,8 @@ size,time_s,peak_bytes,n_objects -500,0.011793572,16132742,500 -1000,0.024660165,31089318,1000 -2000,0.070026669,61002470,2000 -4000,0.147623911,120828758,4000 -8000,0.319933075,240481350,8000 -16000,0.70996717,479786454,16000 -32000,1.516527735,958396838,32000 +500,0.011885933,16132910,500 +1000,0.025495754,31089486,1000 +2000,0.070905483,61002638,2000 +4000,0.147398374,120828942,4000 +8000,0.318729175,240481518,8000 +16000,0.706239265,479786702,16000 +32000,1.5363118980000001,958397006,32000 diff --git a/bench/results/shapes_loop.csv b/bench/results/shapes_loop.csv index d6f99a7..ed651bf 100644 --- a/bench/results/shapes_loop.csv +++ b/bench/results/shapes_loop.csv @@ -1,8 +1,8 @@ size,time_s,peak_bytes,n_objects -500,0.007458816,15417205,500 -1000,0.01518347,29612582,1000 -2000,0.053706302,58004998,2000 -4000,0.116600385,114745766,4000 -8000,0.231873919,228230982,8000 -16000,0.49902595,455266923,16000 -32000,1.058962607,909351467,32000 +500,0.008083826,15417373,500 +1000,0.016896656,29612750,1000 +2000,0.045732213,58005166,2000 +4000,0.098551455,114745934,4000 +8000,0.206445365,228231150,8000 +16000,0.463555498,455267091,16000 +32000,1.054191159,909351635,32000 diff --git a/core/compiler/src/lib.rs b/core/compiler/src/lib.rs index ef7030e..f4b00c9 100644 --- a/core/compiler/src/lib.rs +++ b/core/compiler/src/lib.rs @@ -450,12 +450,28 @@ mod tests { #[ignore = "scaling benchmark; run in release, serially: cargo test -p compiler --release -- --ignored --test-threads=1 bench_"] fn bench_hierarchy() { let _g = bench_guard(); + // Deep hierarchies are traversed by native recursion in the compiler, + // so compiling `h{depth}` needs ~O(depth) native stack frames. The + // default ~2 MiB test-thread stack overflows past ~150 levels, so run + // the whole axis on a thread with a 512 MiB stack: that reaches a few + // thousand levels (the sweep below goes to 2048) and leaves headroom + // to push further via `ARGON_BENCH_HIER_*`. The thread is spawned once, + // outside the timed `measure()` loop, so it does not perturb timings. + std::thread::Builder::new() + .stack_size(512 * 1024 * 1024) + .spawn(bench_hierarchy_body) + .unwrap() + .join() + .unwrap(); + } + + fn bench_hierarchy_body() { let dir = PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("build/bench_hier"); std::fs::create_dir_all(&dir).unwrap(); let lib = dir.join("lib.ar"); let mut rows = Vec::new(); - for depth in bench_sizes("ARGON_BENCH_HIER_SINGLE", &[4, 8, 16, 32, 48, 64, 96, 128]) + for depth in bench_sizes("ARGON_BENCH_HIER_SINGLE", &[4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048]) .into_iter() .map(|d| d as usize) { @@ -491,7 +507,7 @@ mod tests { // exponentially and had to be capped near depth 18.) Override // `ARGON_BENCH_HIER_DOUBLE` to push deeper. let mut rows = Vec::new(); - for depth in bench_sizes("ARGON_BENCH_HIER_DOUBLE", &[4, 8, 16, 32, 48, 64, 96, 128]) + for depth in bench_sizes("ARGON_BENCH_HIER_DOUBLE", &[4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048]) .into_iter() .map(|d| d as usize) {