Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 22 additions & 15 deletions bench/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,13 @@ filter keeps the run from also executing the other (`#[ignore]`'d) tests. Each
test writes `bench/results/<axis>.csv` with columns
`size,time_s,peak_bytes,n_objects`. `plot_scaling.py` needs only the standard
library to print the summary table; `matplotlib` is required to draw the figure.
The fast `stress_*_smoke` tests (which just check that each example still
compiles) run in the normal `cargo test` suite and are **not** ignored.
The figure is drawn for the ACM `acmart` `sigconf` camera-ready format: it is
sized to the full two-column text width (~7 in, i.e. a `figure*`), uses a
colorblind-safe palette with distinct markers (legible in grayscale), and
embeds its fonts as Type 42 — never Type 3, which ACM's TAPS pipeline rejects —
so the PDF drops into the paper at `width=\textwidth` with no font-shrinking
rescale. The fast `stress_*_smoke` tests (which just check that each example
still compiles) run in the normal `cargo test` suite and are **not** ignored.

### Configuring the sweeps

Expand All @@ -96,8 +101,8 @@ how an axis scales — without editing any source. Pass a comma-separated list:
| `ARGON_BENCH_SHAPES_LOOP` | shapes (`for` loop) | `500,1000,2000,4000,8000,16000,32000` |
| `ARGON_BENCH_INSTANCES` | instances | `500,…,64000` |
| `ARGON_BENCH_CONSTRAINTS` | coupled constraints | `32,64,128,256,512,1024,2048,4096,8192,16384` |
| `ARGON_BENCH_HIER_SINGLE` | hierarchy (1 ref) | `4,8,16,32,48,64,96,128` |
| `ARGON_BENCH_HIER_DOUBLE` | hierarchy (2 refs) | `4,8,16,32,48,64,96,128` |
| `ARGON_BENCH_HIER_SINGLE` | hierarchy (1 ref) | `4,8,16,32,64,128,256,512,1024,2048` |
| `ARGON_BENCH_HIER_DOUBLE` | hierarchy (2 refs) | `4,8,16,32,64,128,256,512,1024,2048` |

```bash
# e.g. sweep the for-loop variant out to the same sizes as bench_shapes
Expand Down Expand Up @@ -128,12 +133,12 @@ parameter; "peak" is peak heap allocated during compilation.

| Axis | largest `n` | time @ largest | peak mem @ largest | empirical scaling |
| ---- | ----------- | -------------- | ------------------ | ----------------- |
| Shapes (recursion) | 32 000 rects | 1.52 s | 0.89 GiB | **~linear** (time `∝ n^1.2`, mem `∝ n^1.0`) |
| Shapes (recursion) | 32 000 rects | 1.54 s | 0.89 GiB | **~linear** (time `∝ n^1.2`, mem `∝ n^1.0`) |
| Instances | 64 000 insts | 3.08 s | 1.26 GiB | **~linear** (time `∝ n^1.2`, mem `∝ n^1.0`) |
| Hierarchy, 1 child ref | depth 128 | 0.005 s | 11 MiB | **linear** in depth |
| Coupled constraints | 16 384 rects | 1.76 s | 0.59 GiB | **~linear** (time `∝ n^1.04`, mem `∝ n^0.90`) |
| Hierarchy, 1 child ref | depth 2048 | 0.15 s | 160 MiB | **linear** in depth |
| Coupled constraints | 16 384 rects | 1.37 s | 0.59 GiB | **~linear** (time `∝ n^1.0`, mem `∝ n^0.90`) |
| Shapes (`for`-loop) | 32 000 rects | 1.06 s | 0.85 GiB | **~linear** (time `∝ n^1.2`, mem `∝ n^1.0`) |
| Hierarchy, 2 child refs | depth 128 | 0.006 s | 11 MiB | **linear** in depth (was exponential before the shared-type fix) |
| Hierarchy, 2 child refs | depth 2048 | 0.14 s | 163 MiB | **linear** in depth (was exponential before the shared-type fix) |

### Interpretation

Expand All @@ -157,8 +162,8 @@ parameter; "peak" is peak heap allocated during compilation.
the dense SVD never runs. The general dense solver is retained only as a
fallback for an *irreducible* coupled core (a block with no ≤2-variable
pivot). This axis used to be **super-cubic** — `~22 s` at `n=1024`, steepening
toward the `O(n^3)` of dense factorization — and is now `~n^1.04` in time
(`1.76 s` at `n=16384`, 16× larger, in less memory than the old `n=1024`
toward the `O(n^3)` of dense factorization — and is now `~n^1.0` in time
(`1.37 s` at `n=16384`, 16× larger, in less memory than the old `n=1024`
dense matrix used). The "general linear constraint solving (slow)" caveat in
the top-level README now bites only for genuinely dense coupled blocks, not
for the common sparse-but-coupled case.
Expand All @@ -170,13 +175,15 @@ parameter; "peak" is peak heap allocated during compilation.
tree: whether a cell references its child **once** (`let i = inst(child());`)
or **twice** (the `let c = child(); let i = inst(c);` idiom from the tutorial),
`h{k}` holds shared pointers to the single type of `h{k-1}`, and both variants
cost the same — linear in depth (≈11 MiB / 6 ms at depth 128, the two series
within ~15% of each other). Before this fix the type was deep-copied per
cost the same — linear in depth (≈160 MiB / 0.15 s at depth 2048, the two
series within ~2% of each other). Before this fix the type was deep-copied per
reference, so the single-ref chain was quadratic (`~depth^1.4`) and the
double-ref chain **doubled with every level** (`×1.9` measured), exhausting
memory beyond ~depth 20 (depth 18 alone took ~3.6 GiB / 11.5 s). The remaining
hierarchy limit is unrelated to the type representation: very deep chains hit a
native-recursion stack limit in the compiler at a few hundred levels.
hierarchy limit is unrelated to the type representation: the compiler walks the
hierarchy by native recursion, so depth is bounded by the stack — the benchmark
runs this axis on a 512 MiB stack (reaching a few thousand levels), and lifting
the cap entirely would mean turning that recursion into an explicit work-stack.

- **Recursion and iteration now scale identically.** `shapes` and `shapes_loop`
emit identical geometry; the only difference is that `shapes_loop` builds and
Expand All @@ -187,7 +194,7 @@ parameter; "peak" is peak heap allocated during compilation.
persistent vector and lowering `range` to a native builtin made `cons`
O(log n) and `range` O(n); the two series now coincide, both linear in time
and memory out to 32 000 rectangles (`shapes_loop`: 1.06 s / 0.85 GiB;
`shapes`: 1.52 s / 0.89 GiB). The idiomatic `for i in std::range(n)` loop is
`shapes`: 1.54 s / 0.89 GiB). The idiomatic `for i in std::range(n)` loop is
no longer a scaling hazard. The gap between the two series is now a small
constant — `shapes_loop` is even marginally faster, as the native `range`
avoids the per-element recursion overhead of `emit_shapes`.
Expand Down
Binary file modified bench/argon_scaling.pdf
Binary file not shown.
Binary file modified bench/argon_scaling.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
96 changes: 76 additions & 20 deletions bench/plot_scaling.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,61 @@
# exponentially; on the current build every axis is sub-exponential.
SERIES = [
("shapes", "Shapes (recursion)", "# rectangles", "poly"),
("shapes_loop", "Shapes (for-loop / cons list)", "# rectangles", "poly"),
("shapes_loop", "Shapes (for-loop)", "# rectangles", "poly"),
("instances", "Instances", "# instances", "poly"),
("constraints", "Coupled constraints", "# coupled rects", "poly"),
("hierarchy_single_ref", "Hierarchy (1 child ref)", "depth", "poly"),
("hierarchy_double_ref", "Hierarchy (2 child refs)", "depth", "poly"),
("hierarchy_single_ref", "Hierarchy (1 ref)", "depth", "poly"),
("hierarchy_double_ref", "Hierarchy (2 refs)", "depth", "poly"),
]

# Okabe-Ito colorblind-safe palette (black and yellow dropped for line
# contrast on white), ordered to match SERIES. The two "twin" pairs --
# recursion/for-loop and 1-ref/2-ref -- get distinct hues so that their
# near-coincidence on the plot reads as two curves landing on top of each
# other rather than one. Markers are also distinct so the series remain
# separable in grayscale print.
PALETTE = ["#0072B2", "#56B4E9", "#009E73", "#D55E00", "#CC79A7", "#E69F00"]
MARKERS = ["o", "s", "^", "D", "v", "P"]


def apply_pub_style(matplotlib):
"""ACM acmart (sigconf, camera-ready) publication style.

The critical setting is ``fonttype = 42``: it embeds text as subsetted
TrueType (Type 42) glyphs instead of matplotlib's default Type 3 fonts,
which ACM's TAPS pipeline (and IEEE PDF eXpress) reject. The figure uses a
sans-serif (Arial) face; ``Arial`` is listed first for portability, then the
metric-identical open clones (Liberation Sans / Arimo), then ``Nimbus Sans``
(a Helvetica/Arial-metric clone -- the fallback present on this machine when
Arial proper is not installed). Every text element is >= 8 pt, so the figure,
dropped into a full-width ``figure*`` at the sigconf text width (~7 in),
renders all text at 8-9 pt without any rescaling.
"""
matplotlib.rcParams.update({
"pdf.fonttype": 42,
"ps.fonttype": 42,
"font.family": "sans-serif",
"font.sans-serif": ["Arial", "Liberation Sans", "Arimo",
"Nimbus Sans", "Helvetica", "DejaVu Sans"],
"mathtext.fontset": "dejavusans",
"font.size": 8,
"axes.titlesize": 9,
"axes.labelsize": 8.5,
"xtick.labelsize": 8,
"ytick.labelsize": 8,
"legend.fontsize": 8,
"axes.linewidth": 0.7,
"lines.linewidth": 1.3,
"lines.markersize": 4.2,
"grid.linewidth": 0.5,
"xtick.major.width": 0.7,
"ytick.major.width": 0.7,
"xtick.minor.width": 0.5,
"ytick.minor.width": 0.5,
"savefig.dpi": 300,
"figure.dpi": 300,
})


def load(path):
xs, ts, ms = [], [], []
Expand Down Expand Up @@ -117,39 +165,47 @@ def main():
import matplotlib

matplotlib.use("Agg")
apply_pub_style(matplotlib)
import matplotlib.pyplot as plt
except ImportError:
sys.exit("\nmatplotlib not installed; printed summary only. `pip install matplotlib` to draw.")

fig, (ax_t, ax_m) = plt.subplots(1, 2, figsize=(13, 5.2))
markers = ["o", "s", "^", "D", "v", "P"]
for (key, _, _, _), marker in zip(SERIES, markers):
# Full text width of an ACM acmart sigconf two-column figure* (~7 in).
fig, (ax_t, ax_m) = plt.subplots(1, 2, figsize=(7.0, 2.9), layout="constrained")
for (key, _, _, _), marker, color in zip(SERIES, MARKERS, PALETTE):
if key not in data:
continue
label, unit, model, xs, ts, ms = data[key]
t_suffix, _ = describe(model, xs, ts)
m_suffix, _ = describe(model, xs, ms)
ax_t.plot(xs, ts, marker=marker, label=f"{label} ({t_suffix})")
ax_m.plot(xs, [m / 2**20 for m in ms], marker=marker,
label=f"{label} ({m_suffix})")
style = dict(marker=marker, color=color,
markeredgecolor="white", markeredgewidth=0.5)
ax_t.plot(xs, ts, label=label, **style)
ax_m.plot(xs, [m / 2**20 for m in ms], label=label, **style)

for ax in (ax_t, ax_m):
ax.set_xscale("log")
ax.set_yscale("log")
ax.set_xlabel("problem size $n$ (rectangles / constraints / instances / depth)")
ax.grid(True, which="both", ls=":", alpha=0.4)
ax.set_xlabel("problem size n")
ax.grid(True, which="major", ls=":", alpha=0.45)
ax.grid(True, which="minor", ls=":", alpha=0.2)
ax.tick_params(which="both", direction="in", top=True, right=True)

ax_t.set_ylabel("compile time (s)")
ax_t.set_title("Argon compile-time scaling")
ax_t.set_title("(a) Compile time")
ax_m.set_ylabel("peak heap allocated (MiB)")
ax_m.set_title("Argon memory scaling")
ax_t.legend(fontsize=8, loc="upper left")
ax_m.legend(fontsize=8, loc="upper left")
fig.tight_layout()

ax_m.set_title("(b) Peak heap memory")
leg_kw = dict(loc="upper left", handlelength=1.6, labelspacing=0.3,
borderpad=0.4, handletextpad=0.5, framealpha=0.9,
edgecolor="0.7", fancybox=False)
ax_t.legend(**leg_kw)
ax_m.legend(**leg_kw)

# Saved at the figure's native size (constrained layout already reserves
# room for labels/legend), so the PDF MediaBox stays at the sigconf text
# width and the figure drops into the paper at width=\textwidth with no
# font-shrinking rescale.
for ext in ("png", "pdf"):
out = f"{args.out}.{ext}"
fig.savefig(out, dpi=150, bbox_inches="tight")
fig.savefig(out)
print(f"wrote {out}")


Expand Down
20 changes: 10 additions & 10 deletions bench/results/constraints.csv
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
size,time_s,peak_bytes,n_objects
32,0.003404379,2558933,33
64,0.004213333,3802613,65
128,0.007537572,6289973,129
256,0.014580103,11264693,257
512,0.02941705,21214133,513
1024,0.063108165,41113013,1025
2048,0.137085263,80910773,2049
4096,0.306775942,160506293,4097
8192,0.819698386,319697317,8193
16384,1.762328533,638079381,16385
32,0.003732583,2558722,33
64,0.00413684,3802338,65
128,0.00750728,6289570,129
256,0.014379716,11264034,257
512,0.035677509,21212962,513
1024,0.066225588,41110818,1025
2048,0.140565116,80906530,2049
4096,0.295335076,160497938,4097
8192,0.676296905,319680802,8193
16384,1.373314977,638046466,16385
18 changes: 10 additions & 8 deletions bench/results/hierarchy_double_ref.csv
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
size,time_s,peak_bytes,n_objects
4,0.000954884,1468649,9
8,0.001127072,1737159,17
16,0.00144585,2399391,33
32,0.00209207,3721935,65
48,0.002703978,5197287,97
64,0.003269988,6368992,129
96,0.004542949,9284271,193
128,0.005712034,11664148,257
4,0.000902074,1469041,9
8,0.000923206,1737943,17
16,0.001169244,2400959,33
32,0.001765389,3725071,65
64,0.002942927,6375264,129
128,0.005334938,11676692,257
256,0.01091318,22279032,513
512,0.024664048,43483256,1025
1024,0.056468784,85891825,2049
2048,0.143552596,170708721,4097
18 changes: 10 additions & 8 deletions bench/results/hierarchy_single_ref.csv
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
size,time_s,peak_bytes,n_objects
4,0.000731158,1457539,9
8,0.000869485,1726044,17
16,0.001138329,2377156,33
32,0.001715575,3677460,65
48,0.002269978,5100553,97
64,0.002783338,6280001,129
96,0.003942497,9087143,193
128,0.005035909,11486197,257
4,0.000732499,1457931,9
8,0.000825389,1726828,17
16,0.001202294,2378724,33
32,0.001827103,3680596,65
64,0.003106322,6286273,129
128,0.005698433,11498741,257
256,0.012153742,21923197,513
512,0.028767297,42771581,1025
1024,0.064693487,84468470,2049
2048,0.146190335,167862006,4097
16 changes: 8 additions & 8 deletions bench/results/instances.csv
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
size,time_s,peak_bytes,n_objects
500,0.011955936,11779220,501
1000,0.024739344,22363452,1001
2000,0.055428518,43531916,2001
4000,0.141616584,85868844,4001
8000,0.306157668,170542684,8001
16000,0.674421326,339890396,16001
32000,1.439115681,678585772,32001
64000,3.081957272,1355976556,64001
500,0.011928107,11779388,501
1000,0.024442593,22363620,1001
2000,0.054984212,43532084,2001
4000,0.14333945,85869012,4001
8000,0.302164048,170542852,8001
16000,0.673339455,339890580,16001
32000,1.427961141,678585956,32001
64000,3.077511752,1355976740,64001
14 changes: 7 additions & 7 deletions bench/results/shapes.csv
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
size,time_s,peak_bytes,n_objects
500,0.011793572,16132742,500
1000,0.024660165,31089318,1000
2000,0.070026669,61002470,2000
4000,0.147623911,120828758,4000
8000,0.319933075,240481350,8000
16000,0.70996717,479786454,16000
32000,1.516527735,958396838,32000
500,0.011885933,16132910,500
1000,0.025495754,31089486,1000
2000,0.070905483,61002638,2000
4000,0.147398374,120828942,4000
8000,0.318729175,240481518,8000
16000,0.706239265,479786702,16000
32000,1.5363118980000001,958397006,32000
14 changes: 7 additions & 7 deletions bench/results/shapes_loop.csv
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
size,time_s,peak_bytes,n_objects
500,0.007458816,15417205,500
1000,0.01518347,29612582,1000
2000,0.053706302,58004998,2000
4000,0.116600385,114745766,4000
8000,0.231873919,228230982,8000
16000,0.49902595,455266923,16000
32000,1.058962607,909351467,32000
500,0.008083826,15417373,500
1000,0.016896656,29612750,1000
2000,0.045732213,58005166,2000
4000,0.098551455,114745934,4000
8000,0.206445365,228231150,8000
16000,0.463555498,455267091,16000
32000,1.054191159,909351635,32000
20 changes: 18 additions & 2 deletions core/compiler/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -450,12 +450,28 @@ mod tests {
#[ignore = "scaling benchmark; run in release, serially: cargo test -p compiler --release -- --ignored --test-threads=1 bench_"]
fn bench_hierarchy() {
let _g = bench_guard();
// Deep hierarchies are traversed by native recursion in the compiler,
// so compiling `h{depth}` needs ~O(depth) native stack frames. The
// default ~2 MiB test-thread stack overflows past ~150 levels, so run
// the whole axis on a thread with a 512 MiB stack: that reaches a few
// thousand levels (the sweep below goes to 2048) and leaves headroom
// to push further via `ARGON_BENCH_HIER_*`. The thread is spawned once,
// outside the timed `measure()` loop, so it does not perturb timings.
std::thread::Builder::new()
.stack_size(512 * 1024 * 1024)
.spawn(bench_hierarchy_body)
.unwrap()
.join()
.unwrap();
}

fn bench_hierarchy_body() {
let dir = PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("build/bench_hier");
std::fs::create_dir_all(&dir).unwrap();
let lib = dir.join("lib.ar");

let mut rows = Vec::new();
for depth in bench_sizes("ARGON_BENCH_HIER_SINGLE", &[4, 8, 16, 32, 48, 64, 96, 128])
for depth in bench_sizes("ARGON_BENCH_HIER_SINGLE", &[4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048])
.into_iter()
.map(|d| d as usize)
{
Expand Down Expand Up @@ -491,7 +507,7 @@ mod tests {
// exponentially and had to be capped near depth 18.) Override
// `ARGON_BENCH_HIER_DOUBLE` to push deeper.
let mut rows = Vec::new();
for depth in bench_sizes("ARGON_BENCH_HIER_DOUBLE", &[4, 8, 16, 32, 48, 64, 96, 128])
for depth in bench_sizes("ARGON_BENCH_HIER_DOUBLE", &[4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048])
.into_iter()
.map(|d| d as usize)
{
Expand Down
Loading