diff --git a/bench/README.md b/bench/README.md
index f4b449c..38039ce 100644
--- a/bench/README.md
+++ b/bench/README.md
@@ -80,8 +80,13 @@ filter keeps the run from also executing the other (`#[ignore]`'d) tests. Each
 test writes `bench/results/<axis>.csv` with columns
 `size,time_s,peak_bytes,n_objects`. `plot_scaling.py` needs only the standard
 library to print the summary table; `matplotlib` is required to draw the figure.
-The fast `stress_*_smoke` tests (which just check that each example still
-compiles) run in the normal `cargo test` suite and are **not** ignored.
+The figure is drawn for the ACM `acmart` `sigconf` camera-ready format: it is
+sized to the full two-column text width (~7 in, i.e. a `figure*`), uses a
+colorblind-safe palette with distinct markers (legible in grayscale), and
+embeds its fonts as Type 42 — never Type 3, which ACM's TAPS pipeline rejects —
+so the PDF drops into the paper at `width=\textwidth` with no font-shrinking
+rescale. The fast `stress_*_smoke` tests (which just check that each example
+still compiles) run in the normal `cargo test` suite and are **not** ignored.
 
 ### Configuring the sweeps
 
@@ -96,8 +101,8 @@ how an axis scales — without editing any source. Pass a comma-separated list:
 | `ARGON_BENCH_SHAPES_LOOP`   | shapes (`for` loop)  | `500,1000,2000,4000,8000,16000,32000` |
 | `ARGON_BENCH_INSTANCES`     | instances            | `500,…,64000` |
 | `ARGON_BENCH_CONSTRAINTS`   | coupled constraints  | `32,64,128,256,512,1024,2048,4096,8192,16384` |
-| `ARGON_BENCH_HIER_SINGLE`   | hierarchy (1 ref)    | `4,8,16,32,48,64,96,128` |
-| `ARGON_BENCH_HIER_DOUBLE`   | hierarchy (2 refs)   | `4,8,16,32,48,64,96,128` |
+| `ARGON_BENCH_HIER_SINGLE`   | hierarchy (1 ref)    | `4,8,16,32,64,128,256,512,1024,2048` |
+| `ARGON_BENCH_HIER_DOUBLE`   | hierarchy (2 refs)   | `4,8,16,32,64,128,256,512,1024,2048` |
 
 ```bash
 # e.g. sweep the for-loop variant out to the same sizes as bench_shapes
@@ -128,12 +133,12 @@ parameter; "peak" is peak heap allocated during compilation.
 
 | Axis | largest `n` | time @ largest | peak mem @ largest | empirical scaling |
 | ---- | ----------- | -------------- | ------------------ | ----------------- |
-| Shapes (recursion)           | 32 000 rects   | 1.52 s  | 0.89 GiB | **~linear** (time `∝ n^1.2`, mem `∝ n^1.0`) |
+| Shapes (recursion)           | 32 000 rects   | 1.54 s  | 0.89 GiB | **~linear** (time `∝ n^1.2`, mem `∝ n^1.0`) |
 | Instances                    | 64 000 insts   | 3.08 s  | 1.26 GiB | **~linear** (time `∝ n^1.2`, mem `∝ n^1.0`) |
-| Hierarchy, 1 child ref       | depth 128      | 0.005 s | 11 MiB   | **linear** in depth |
-| Coupled constraints          | 16 384 rects   | 1.76 s  | 0.59 GiB | **~linear** (time `∝ n^1.04`, mem `∝ n^0.90`) |
+| Hierarchy, 1 child ref       | depth 2048     | 0.15 s  | 160 MiB  | **linear** in depth |
+| Coupled constraints          | 16 384 rects   | 1.37 s  | 0.59 GiB | **~linear** (time `∝ n^1.0`, mem `∝ n^0.90`) |
 | Shapes (`for`-loop)          | 32 000 rects   | 1.06 s  | 0.85 GiB | **~linear** (time `∝ n^1.2`, mem `∝ n^1.0`) |
-| Hierarchy, 2 child refs      | depth 128      | 0.006 s | 11 MiB   | **linear** in depth (was exponential before the shared-type fix) |
+| Hierarchy, 2 child refs      | depth 2048     | 0.14 s  | 163 MiB  | **linear** in depth (was exponential before the shared-type fix) |
 
 ### Interpretation
 
@@ -157,8 +162,8 @@ parameter; "peak" is peak heap allocated during compilation.
   the dense SVD never runs. The general dense solver is retained only as a
   fallback for an *irreducible* coupled core (a block with no ≤2-variable
   pivot). This axis used to be **super-cubic** — `~22 s` at `n=1024`, steepening
-  toward the `O(n^3)` of dense factorization — and is now `~n^1.04` in time
-  (`1.76 s` at `n=16384`, 16× larger, in less memory than the old `n=1024`
+  toward the `O(n^3)` of dense factorization — and is now `~n^1.0` in time
+  (`1.37 s` at `n=16384`, 16× larger, in less memory than the old `n=1024`
   dense matrix used). The "general linear constraint solving (slow)" caveat in
   the top-level README now bites only for genuinely dense coupled blocks, not
   for the common sparse-but-coupled case.
@@ -170,13 +175,15 @@ parameter; "peak" is peak heap allocated during compilation.
   tree: whether a cell references its child **once** (`let i = inst(child());`)
   or **twice** (the `let c = child(); let i = inst(c);` idiom from the tutorial),
   `h{k}` holds shared pointers to the single type of `h{k-1}`, and both variants
-  cost the same — linear in depth (≈11 MiB / 6 ms at depth 128, the two series
-  within ~15% of each other). Before this fix the type was deep-copied per
+  cost the same — linear in depth (≈160 MiB / 0.15 s at depth 2048, the two
+  series within ~2% of each other). Before this fix the type was deep-copied per
   reference, so the single-ref chain was quadratic (`~depth^1.4`) and the
   double-ref chain **doubled with every level** (`×1.9` measured), exhausting
   memory beyond ~depth 20 (depth 18 alone took ~3.6 GiB / 11.5 s). The remaining
-  hierarchy limit is unrelated to the type representation: very deep chains hit a
-  native-recursion stack limit in the compiler at a few hundred levels.
+  hierarchy limit is unrelated to the type representation: the compiler walks the
+  hierarchy by native recursion, so depth is bounded by the stack — the benchmark
+  runs this axis on a 512 MiB stack (reaching a few thousand levels), and lifting
+  the cap entirely would mean turning that recursion into an explicit work-stack.
 
 - **Recursion and iteration now scale identically.** `shapes` and `shapes_loop`
   emit identical geometry; the only difference is that `shapes_loop` builds and
@@ -187,7 +194,7 @@ parameter; "peak" is peak heap allocated during compilation.
   persistent vector and lowering `range` to a native builtin made `cons`
   O(log n) and `range` O(n); the two series now coincide, both linear in time
   and memory out to 32 000 rectangles (`shapes_loop`: 1.06 s / 0.85 GiB;
-  `shapes`: 1.52 s / 0.89 GiB). The idiomatic `for i in std::range(n)` loop is
+  `shapes`: 1.54 s / 0.89 GiB). The idiomatic `for i in std::range(n)` loop is
   no longer a scaling hazard. The gap between the two series is now a small
   constant — `shapes_loop` is even marginally faster, as the native `range`
   avoids the per-element recursion overhead of `emit_shapes`.
diff --git a/bench/argon_scaling.pdf b/bench/argon_scaling.pdf
index 6b49e33..733d73f 100644
Binary files a/bench/argon_scaling.pdf and b/bench/argon_scaling.pdf differ
diff --git a/bench/argon_scaling.png b/bench/argon_scaling.png
index 7501131..2b6c620 100644
Binary files a/bench/argon_scaling.png and b/bench/argon_scaling.png differ
diff --git a/bench/plot_scaling.py b/bench/plot_scaling.py
index 91c9a19..3144a09 100644
--- a/bench/plot_scaling.py
+++ b/bench/plot_scaling.py
@@ -26,13 +26,61 @@
 # exponentially; on the current build every axis is sub-exponential.
 SERIES = [
     ("shapes", "Shapes (recursion)", "# rectangles", "poly"),
-    ("shapes_loop", "Shapes (for-loop / cons list)", "# rectangles", "poly"),
+    ("shapes_loop", "Shapes (for-loop)", "# rectangles", "poly"),
     ("instances", "Instances", "# instances", "poly"),
     ("constraints", "Coupled constraints", "# coupled rects", "poly"),
-    ("hierarchy_single_ref", "Hierarchy (1 child ref)", "depth", "poly"),
-    ("hierarchy_double_ref", "Hierarchy (2 child refs)", "depth", "poly"),
+    ("hierarchy_single_ref", "Hierarchy (1 ref)", "depth", "poly"),
+    ("hierarchy_double_ref", "Hierarchy (2 refs)", "depth", "poly"),
 ]
 
+# Okabe-Ito colorblind-safe palette (black and yellow dropped for line
+# contrast on white), ordered to match SERIES. The two "twin" pairs --
+# recursion/for-loop and 1-ref/2-ref -- get distinct hues so that their
+# near-coincidence on the plot reads as two curves landing on top of each
+# other rather than one. Markers are also distinct so the series remain
+# separable in grayscale print.
+PALETTE = ["#0072B2", "#56B4E9", "#009E73", "#D55E00", "#CC79A7", "#E69F00"]
+MARKERS = ["o", "s", "^", "D", "v", "P"]
+
+
+def apply_pub_style(matplotlib):
+    """ACM acmart (sigconf, camera-ready) publication style.
+
+    The critical setting is ``fonttype = 42``: it embeds text as subsetted
+    TrueType (Type 42) glyphs instead of matplotlib's default Type 3 fonts,
+    which ACM's TAPS pipeline (and IEEE PDF eXpress) reject. The figure uses a
+    sans-serif (Arial) face; ``Arial`` is listed first for portability, then the
+    metric-identical open clones (Liberation Sans / Arimo), then ``Nimbus Sans``
+    (a Helvetica/Arial-metric clone -- the fallback present on this machine when
+    Arial proper is not installed). Every text element is >= 8 pt, so the figure,
+    dropped into a full-width ``figure*`` at the sigconf text width (~7 in),
+    renders all text at 8-9 pt without any rescaling.
+    """
+    matplotlib.rcParams.update({
+        "pdf.fonttype": 42,
+        "ps.fonttype": 42,
+        "font.family": "sans-serif",
+        "font.sans-serif": ["Arial", "Liberation Sans", "Arimo",
+                            "Nimbus Sans", "Helvetica", "DejaVu Sans"],
+        "mathtext.fontset": "dejavusans",
+        "font.size": 8,
+        "axes.titlesize": 9,
+        "axes.labelsize": 8.5,
+        "xtick.labelsize": 8,
+        "ytick.labelsize": 8,
+        "legend.fontsize": 8,
+        "axes.linewidth": 0.7,
+        "lines.linewidth": 1.3,
+        "lines.markersize": 4.2,
+        "grid.linewidth": 0.5,
+        "xtick.major.width": 0.7,
+        "ytick.major.width": 0.7,
+        "xtick.minor.width": 0.5,
+        "ytick.minor.width": 0.5,
+        "savefig.dpi": 300,
+        "figure.dpi": 300,
+    })
+
 
 def load(path):
     xs, ts, ms = [], [], []
@@ -117,39 +165,47 @@ def main():
         import matplotlib
 
         matplotlib.use("Agg")
+        apply_pub_style(matplotlib)
         import matplotlib.pyplot as plt
     except ImportError:
         sys.exit("\nmatplotlib not installed; printed summary only. `pip install matplotlib` to draw.")
 
-    fig, (ax_t, ax_m) = plt.subplots(1, 2, figsize=(13, 5.2))
-    markers = ["o", "s", "^", "D", "v", "P"]
-    for (key, _, _, _), marker in zip(SERIES, markers):
+    # Full text width of an ACM acmart sigconf two-column figure* (~7 in).
+    fig, (ax_t, ax_m) = plt.subplots(1, 2, figsize=(7.0, 2.9), layout="constrained")
+    for (key, _, _, _), marker, color in zip(SERIES, MARKERS, PALETTE):
         if key not in data:
             continue
         label, unit, model, xs, ts, ms = data[key]
-        t_suffix, _ = describe(model, xs, ts)
-        m_suffix, _ = describe(model, xs, ms)
-        ax_t.plot(xs, ts, marker=marker, label=f"{label}  ({t_suffix})")
-        ax_m.plot(xs, [m / 2**20 for m in ms], marker=marker,
-                  label=f"{label}  ({m_suffix})")
+        style = dict(marker=marker, color=color,
+                     markeredgecolor="white", markeredgewidth=0.5)
+        ax_t.plot(xs, ts, label=label, **style)
+        ax_m.plot(xs, [m / 2**20 for m in ms], label=label, **style)
 
     for ax in (ax_t, ax_m):
         ax.set_xscale("log")
         ax.set_yscale("log")
-        ax.set_xlabel("problem size $n$ (rectangles / constraints / instances / depth)")
-        ax.grid(True, which="both", ls=":", alpha=0.4)
+        ax.set_xlabel("problem size n")
+        ax.grid(True, which="major", ls=":", alpha=0.45)
+        ax.grid(True, which="minor", ls=":", alpha=0.2)
+        ax.tick_params(which="both", direction="in", top=True, right=True)
 
     ax_t.set_ylabel("compile time (s)")
-    ax_t.set_title("Argon compile-time scaling")
+    ax_t.set_title("(a) Compile time")
     ax_m.set_ylabel("peak heap allocated (MiB)")
-    ax_m.set_title("Argon memory scaling")
-    ax_t.legend(fontsize=8, loc="upper left")
-    ax_m.legend(fontsize=8, loc="upper left")
-    fig.tight_layout()
-
+    ax_m.set_title("(b) Peak heap memory")
+    leg_kw = dict(loc="upper left", handlelength=1.6, labelspacing=0.3,
+                  borderpad=0.4, handletextpad=0.5, framealpha=0.9,
+                  edgecolor="0.7", fancybox=False)
+    ax_t.legend(**leg_kw)
+    ax_m.legend(**leg_kw)
+
+    # Saved at the figure's native size (constrained layout already reserves
+    # room for labels/legend), so the PDF MediaBox stays at the sigconf text
+    # width and the figure drops into the paper at width=\textwidth with no
+    # font-shrinking rescale.
     for ext in ("png", "pdf"):
         out = f"{args.out}.{ext}"
-        fig.savefig(out, dpi=150, bbox_inches="tight")
+        fig.savefig(out)
         print(f"wrote {out}")
 
 
diff --git a/bench/results/constraints.csv b/bench/results/constraints.csv
index d44fc70..988829a 100644
--- a/bench/results/constraints.csv
+++ b/bench/results/constraints.csv
@@ -1,11 +1,11 @@
 size,time_s,peak_bytes,n_objects
-32,0.003404379,2558933,33
-64,0.004213333,3802613,65
-128,0.007537572,6289973,129
-256,0.014580103,11264693,257
-512,0.02941705,21214133,513
-1024,0.063108165,41113013,1025
-2048,0.137085263,80910773,2049
-4096,0.306775942,160506293,4097
-8192,0.819698386,319697317,8193
-16384,1.762328533,638079381,16385
+32,0.003732583,2558722,33
+64,0.00413684,3802338,65
+128,0.00750728,6289570,129
+256,0.014379716,11264034,257
+512,0.035677509,21212962,513
+1024,0.066225588,41110818,1025
+2048,0.140565116,80906530,2049
+4096,0.295335076,160497938,4097
+8192,0.676296905,319680802,8193
+16384,1.373314977,638046466,16385
diff --git a/bench/results/hierarchy_double_ref.csv b/bench/results/hierarchy_double_ref.csv
index c6b50c0..b4c084d 100644
--- a/bench/results/hierarchy_double_ref.csv
+++ b/bench/results/hierarchy_double_ref.csv
@@ -1,9 +1,11 @@
 size,time_s,peak_bytes,n_objects
-4,0.000954884,1468649,9
-8,0.001127072,1737159,17
-16,0.00144585,2399391,33
-32,0.00209207,3721935,65
-48,0.002703978,5197287,97
-64,0.003269988,6368992,129
-96,0.004542949,9284271,193
-128,0.005712034,11664148,257
+4,0.000902074,1469041,9
+8,0.000923206,1737943,17
+16,0.001169244,2400959,33
+32,0.001765389,3725071,65
+64,0.002942927,6375264,129
+128,0.005334938,11676692,257
+256,0.01091318,22279032,513
+512,0.024664048,43483256,1025
+1024,0.056468784,85891825,2049
+2048,0.143552596,170708721,4097
diff --git a/bench/results/hierarchy_single_ref.csv b/bench/results/hierarchy_single_ref.csv
index 0ebf503..a1bbacf 100644
--- a/bench/results/hierarchy_single_ref.csv
+++ b/bench/results/hierarchy_single_ref.csv
@@ -1,9 +1,11 @@
 size,time_s,peak_bytes,n_objects
-4,0.000731158,1457539,9
-8,0.000869485,1726044,17
-16,0.001138329,2377156,33
-32,0.001715575,3677460,65
-48,0.002269978,5100553,97
-64,0.002783338,6280001,129
-96,0.003942497,9087143,193
-128,0.005035909,11486197,257
+4,0.000732499,1457931,9
+8,0.000825389,1726828,17
+16,0.001202294,2378724,33
+32,0.001827103,3680596,65
+64,0.003106322,6286273,129
+128,0.005698433,11498741,257
+256,0.012153742,21923197,513
+512,0.028767297,42771581,1025
+1024,0.064693487,84468470,2049
+2048,0.146190335,167862006,4097
diff --git a/bench/results/instances.csv b/bench/results/instances.csv
index 6851eef..bc11643 100644
--- a/bench/results/instances.csv
+++ b/bench/results/instances.csv
@@ -1,9 +1,9 @@
 size,time_s,peak_bytes,n_objects
-500,0.011955936,11779220,501
-1000,0.024739344,22363452,1001
-2000,0.055428518,43531916,2001
-4000,0.141616584,85868844,4001
-8000,0.306157668,170542684,8001
-16000,0.674421326,339890396,16001
-32000,1.439115681,678585772,32001
-64000,3.081957272,1355976556,64001
+500,0.011928107,11779388,501
+1000,0.024442593,22363620,1001
+2000,0.054984212,43532084,2001
+4000,0.14333945,85869012,4001
+8000,0.302164048,170542852,8001
+16000,0.673339455,339890580,16001
+32000,1.427961141,678585956,32001
+64000,3.077511752,1355976740,64001
diff --git a/bench/results/shapes.csv b/bench/results/shapes.csv
index 42ad9bd..e413857 100644
--- a/bench/results/shapes.csv
+++ b/bench/results/shapes.csv
@@ -1,8 +1,8 @@
 size,time_s,peak_bytes,n_objects
-500,0.011793572,16132742,500
-1000,0.024660165,31089318,1000
-2000,0.070026669,61002470,2000
-4000,0.147623911,120828758,4000
-8000,0.319933075,240481350,8000
-16000,0.70996717,479786454,16000
-32000,1.516527735,958396838,32000
+500,0.011885933,16132910,500
+1000,0.025495754,31089486,1000
+2000,0.070905483,61002638,2000
+4000,0.147398374,120828942,4000
+8000,0.318729175,240481518,8000
+16000,0.706239265,479786702,16000
+32000,1.5363118980000001,958397006,32000
diff --git a/bench/results/shapes_loop.csv b/bench/results/shapes_loop.csv
index d6f99a7..ed651bf 100644
--- a/bench/results/shapes_loop.csv
+++ b/bench/results/shapes_loop.csv
@@ -1,8 +1,8 @@
 size,time_s,peak_bytes,n_objects
-500,0.007458816,15417205,500
-1000,0.01518347,29612582,1000
-2000,0.053706302,58004998,2000
-4000,0.116600385,114745766,4000
-8000,0.231873919,228230982,8000
-16000,0.49902595,455266923,16000
-32000,1.058962607,909351467,32000
+500,0.008083826,15417373,500
+1000,0.016896656,29612750,1000
+2000,0.045732213,58005166,2000
+4000,0.098551455,114745934,4000
+8000,0.206445365,228231150,8000
+16000,0.463555498,455267091,16000
+32000,1.054191159,909351635,32000
diff --git a/core/compiler/src/lib.rs b/core/compiler/src/lib.rs
index ef7030e..f4b00c9 100644
--- a/core/compiler/src/lib.rs
+++ b/core/compiler/src/lib.rs
@@ -450,12 +450,28 @@ mod tests {
     #[ignore = "scaling benchmark; run in release, serially: cargo test -p compiler --release -- --ignored --test-threads=1 bench_"]
     fn bench_hierarchy() {
         let _g = bench_guard();
+        // Deep hierarchies are traversed by native recursion in the compiler,
+        // so compiling `h{depth}` needs ~O(depth) native stack frames. The
+        // default ~2 MiB test-thread stack overflows past ~150 levels, so run
+        // the whole axis on a thread with a 512 MiB stack: that reaches a few
+        // thousand levels (the sweep below goes to 2048) and leaves headroom
+        // to push further via `ARGON_BENCH_HIER_*`. The thread is spawned once,
+        // outside the timed `measure()` loop, so it does not perturb timings.
+        std::thread::Builder::new()
+            .stack_size(512 * 1024 * 1024)
+            .spawn(bench_hierarchy_body)
+            .unwrap()
+            .join()
+            .unwrap();
+    }
+
+    fn bench_hierarchy_body() {
         let dir = PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("build/bench_hier");
         std::fs::create_dir_all(&dir).unwrap();
         let lib = dir.join("lib.ar");
 
         let mut rows = Vec::new();
-        for depth in bench_sizes("ARGON_BENCH_HIER_SINGLE", &[4, 8, 16, 32, 48, 64, 96, 128])
+        for depth in bench_sizes("ARGON_BENCH_HIER_SINGLE", &[4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048])
             .into_iter()
             .map(|d| d as usize)
         {
@@ -491,7 +507,7 @@ mod tests {
         // exponentially and had to be capped near depth 18.) Override
         // `ARGON_BENCH_HIER_DOUBLE` to push deeper.
         let mut rows = Vec::new();
-        for depth in bench_sizes("ARGON_BENCH_HIER_DOUBLE", &[4, 8, 16, 32, 48, 64, 96, 128])
+        for depth in bench_sizes("ARGON_BENCH_HIER_DOUBLE", &[4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048])
             .into_iter()
             .map(|d| d as usize)
         {