Zero-shot Multi-modal Bayesian Inference with Hops — multi-modal Bayesian optimization over a probability simplex, designed for high-dimensional materials composition search.
Version: ellipsoid branch (Edit 1 / 2A / 2B applied May 2026)
Location:src/
Last updated: May 2026 — reflects current code state
- The Problem
- The Algorithm — High-Level Story
- Operational Flow Diagram
- All Hyperparameters & What They Control
- File-by-File Reference
- Key Mathematical Derivations
- Benefits, Drawbacks, and Known Failure Modes
- Recent Algorithm Edits (May 2026)
ZoMBI-Hop solves multi-modal Bayesian optimization over a probability simplex.
The canonical use case is materials composition search: given d components (e.g. Pb, Sn, Br, I in a perovskite), each composition is a point on the d-simplex Δ^d = { x ∈ R^d | Σ x_i = 1, x_i ≥ 0 }. The objective (e.g. solar cell efficiency, bandgap) may have multiple local optima — "needles in a haystack" — at very different compositions. Standard BO finds one optimum and parks there. ZoMBI-Hop systematically discovers all significant optima.
Why the simplex is hard:
- It is a curved, bounded manifold — Euclidean BO ignores both constraints.
- Components near zero cause log-ratio coordinates to diverge.
- Multiple optima may be separated by flat low-value regions, not just local barriers.
- Noise in robotic synthesis means the actually-deposited composition (X_actual) deviates from the requested one (X_expected).
ZoMBI-Hop operates in activations (outer loop) and zooms (inner loop).
Each activation attempts to find and declare one new local optimum ("needle"). Once declared, that region is permanently excluded via an ellipsoidal penalty so the next activation explores elsewhere. This continues until the user-specified number of activations is reached or the entire simplex is covered.
- Start at full simplex bounds (or updated bounds after a failure-retry, see below).
- Fit a Gaussian Process in ILR-transformed space. When the bounds are not global (i.e. we are inside a zoom), only points within the current bounds are used for GP training: first the pared (deduplicated) points within bounds fill the
max_gp_pointsbudget, then raw unpenalized points within bounds fill the remainder. This makes the GP posterior tight and locally accurate so that PI drops quickly on genuine convergence. - Run LineBO to propose a candidate: sample random chords through the current bounds, score each chord by integrating the acquisition function along it (mean, not max), pick the best chord, and have the physical experiment evaluate that line.
- After each evaluation, check convergence: PI (probability of improvement) is computed against the local best (best unpenalized point within the current zoom bounds, and
best_ffrom the local GP training set). If PI drops below threshold AND the Y improvement since the previous local best is within output noise, increment a consecutive-converged counter. - After
n_consecutive_convergedconsecutive converged iterations, declare a needle. - Before the next zoom: Jaccard-aware sliding-window bounds selection (
determine_new_bounds) slides over windows of top-M points ranked by Y, picking the first candidate AABB whose Jaccard overlap against the lastjaccard_windowentries inbounds_historyis ≤jaccard_threshold. If all windows exceed the threshold, the least-overlapping window is used. - Secondary Jaccard guard (MC): if the selected AABB still has Monte Carlo Jaccard >
zoom_jaccard_thresholdwith any entry inzoom_bounds_history, the algorithm force-declares a needle at the current best unpenalized point — repeated zooms to the same region are taken as evidence the optimum is there even if PI hasn't formally converged. - Otherwise, zoom in on the new AABB and continue.
On needle declaration (either via PI convergence or Jaccard force-declare):
- Fit a clean local GP using
fit_with_locality: selects only data withinmax_radiusof the needle first, then fills tomax_gp_pointswith remaining points. This gives a locally-accurate Hessian uncontaminated by far-away observations. - Compute the Hessian of the clean base acquisition (no repulsive penalties) at the needle in ILR space via
create_clean_acquisition. The penalty-free acquisition gives the true local curvature rather than one distorted by proximity to other needles. - Derive an ellipsoidal exclusion zone M from the Hessian: the region where the acquisition has dropped by a specified fraction from its peak.
- Record the needle, storing both the peak value and the median Y of all raw observations within
paring_spatial_halfnoise × input_noise_ilrof the needle in ILR space. The median is a noise-robust measure of the local optimum's true value. - Update the penalty mask, reset to full simplex for the next activation.
After every activation completes (needle found or failed), each entry in the pared dataset has its Y value replaced by the median of all Y_all values whose X_all falls within paring_spatial_halfnoise × input_noise_ilr in ILR space. This smooths noise spikes so the GP in subsequent activations trains on a clean, representative signal rather than individual noisy measurements.
The zoom loop is a while current_zoom < max_zooms loop rather than a fixed for zoom loop, which allows retrying the same zoom level on failure without advancing. Two per-activation flags track state: first_failure_handled and data_added_since_last_failure.
When an activation fails (candidate is None, all points penalized, or force-declare returned None):
| Condition | Action |
|---|---|
First failure (not first_failure_handled) |
Recompute all needle ellipsoids using the clean local GP (recompute_all_ellipsoids): for each needle, fit_with_locality → create_clean_acquisition → determine_penalty_ellipsoid. Update dh.needle_M_list. Retry same zoom. |
Subsequent failure, data was added (first_failure_handled AND data_added_since_last_failure) |
Recompute zoom bounds Jaccard-aware: dh.determine_new_bounds(add_to_history=False). The add_to_history=False prevents failure-retry bounds from contaminating the history used by success-path zooms. Retry same zoom. |
Subsequent failure, no new data (first_failure_handled AND NOT data_added_since_last_failure) |
Shrink all ellipsoid semi-axes: dh.shrink_all_needle_radii(bounds_shrink_factor). If max_needle_radius() < min_axis_noise_mult × input_noise_ilr, all exclusion zones are at noise scale → stop. Otherwise retry same zoom. |
data_added_since_last_failure is set True immediately after _objective_wrapper returns (regardless of whether points are penalized), and reset to False after each failure dispatch.
Before any point enters the GP training set, it is checked against the pared dataset — a noise-deduplicated subset. Two points are considered duplicates if their ILR-space distance is less than paring_spatial_halfnoise × σ_ILR AND their Y-values differ by less than paring_y_noise_multiplier × σ_Y. Duplicates are handled by a fair coin: keep the old or replace with the new. This prevents noise spikes from dominating the GP. After each activation, all pared Y values are further smoothed via local medians (see above).
╔══════════════════════════════════════════════════════════════════════════╗
║ ZoMBI-Hop: run() ║
╠══════════════════════════════════════════════════════════════════════════╣
║ ║
║ INIT: load X_init, Y_init → build pared set → set bounds = full Δ ║
║ ║
║ ┌─────────────────────────────────────────────────────────────────┐ ║
║ │ OUTER LOOP: activation = 0 … max_activations │ ║
║ │ │ ║
║ │ first_failure_handled = False │ ║
║ │ data_added_since_last_failure = False │ ║
║ │ zoom_bounds_history = [] │ ║
║ │ │ ║
║ │ ┌──────── ZOOM LOOP: while current_zoom < max_zooms ──────────┐ │ ║
║ │ │ zoom = current_zoom │ │ ║
║ │ │ │ │ ║
║ │ │ [1] if global bounds: get_gp_data() │ │ ║
║ │ │ else: get_zoom_gp_data(bounds) │ │ ║
║ │ │ → pared in bounds first, then raw fill │ │ ║
║ │ │ [2] GPSimplex.fit(X_ilr / ilr_std, Y) │ │ ║
║ │ │ best_f_local = Y.max() │ │ ║
║ │ │ │ │ ║
║ │ │ ┌──── ITERATION LOOP: iter = 0 … max_iterations ────────┐ │ │ ║
║ │ │ │ │ │ │ ║
║ │ │ │ [3] get_candidate(bounds, best_f=best_f_local) │ │ │ ║
║ │ │ │ • RepulsiveAcquisition(UCB/EI) on seeds │ │ │ ║
║ │ │ │ • nat-grad ascent on top-n_restarts seeds │ │ │ ║
║ │ │ │ → candidate (d,) on simplex │ │ │ ║
║ │ │ │ │ │ │ ║
║ │ │ │ if candidate is None → activation_failed, break │ │ │ ║
║ │ │ │ │ │ │ ║
║ │ │ │ prev_best ← get_best_in_bounds(bounds) │ │ │ ║
║ │ │ │ (or get_best_unpenalized if global bounds) │ │ │ ║
║ │ │ │ │ │ │ ║
║ │ │ │ [4] LineBO.sampler(candidate, bounds, acq_fn) │ │ │ ║
║ │ │ │ • generate num_lines random chords │ │ │ ║
║ │ │ │ • score by mean acq over 100 pts │ │ │ ║
║ │ │ │ → (X_expected, X_actual, Y) │ │ │ ║
║ │ │ │ data_added_since_last_failure = True ← always │ │ │ ║
║ │ │ │ │ │ │ ║
║ │ │ │ [5] add_all_points() → update pared set, │ │ │ ║
║ │ │ │ penalty mask, and snapshot │ │ │ ║
║ │ │ │ │ │ │ ║
║ │ │ │ [6] refit GP (local data if zoomed) │ │ │ ║
║ │ │ │ best_f_local = Y.max() │ │ │ ║
║ │ │ │ │ │ │ ║
║ │ │ │ [7] check_convergence(best_f_ref=best_f_local) │ │ │ ║
║ │ │ │ PI vs local best; ΔY vs prev_best in bounds │ │ │ ║
║ │ │ │ consecutive_converged++ │ │ │ ║
║ │ │ │ │ │ │ ║
║ │ │ │ if consecutive_converged >= n_consecutive: │ │ │ ║
║ │ │ │ [8] _declare_needle_at_best() │ │ │ ║
║ │ │ │ • fit_with_locality(needle) → local clean GP │ │ │ ║
║ │ │ │ • create_clean_acquisition() (no repulsion) │ │ │ ║
║ │ │ │ • determine_penalty_ellipsoid → M │ │ │ ║
║ │ │ │ • compute local median Y within σ_ILR │ │ │ ║
║ │ │ │ • add_needle(value, median_value, M) │ │ │ ║
║ │ │ │ • reset bounds to full Δ │ │ │ ║
║ │ │ │ → break zoom loop (needle found) │ │ │ ║
║ │ │ └────────────────────────────────────────────────────── ┘ │ │ ║
║ │ │ │ │ ║
║ │ │ ── AFTER ITERATION LOOP ────────────────────────────── │ │ ║
║ │ │ │ │ ║
║ │ │ if activation_failed: │ │ ║
║ │ │ → _handle_failure_retry() three-way dispatch: │ │ ║
║ │ │ Case 1 (first failure): │ │ ║
║ │ │ recompute_all_ellipsoids (clean local GP each) │ │ ║
║ │ │ update needle_M_list → retry same zoom │ │ ║
║ │ │ Case 2 (subsequent, data added): │ │ ║
║ │ │ determine_new_bounds(add_to_history=False) │ │ ║
║ │ │ update bounds → retry same zoom │ │ ║
║ │ │ Case 3 (subsequent, no data): │ │ ║
║ │ │ shrink_all_needle_radii(bounds_shrink_factor) │ │ ║
║ │ │ if max_radius < min_axis_noise_mult×σ → finished │ │ ║
║ │ │ else → retry same zoom │ │ ║
║ │ │ data_added_since_last_failure = False │ │ ║
║ │ │ consecutive_converged = 0 │ │ ║
║ │ │ continue ← same current_zoom, retry zoom body │ │ ║
║ │ │ │ │ ║
║ │ │ if no needle AND no failure AND zoom < max_zooms-1: │ │ ║
║ │ │ [9a] determine_new_bounds() ← Jaccard sliding window │ │ ║
║ │ │ slides over Y-sorted unpenalized; picks AABB with │ │ ║
║ │ │ Jaccard ≤ jaccard_threshold vs bounds_history │ │ ║
║ │ │ [9b] MC Jaccard guard vs zoom_bounds_history: │ │ ║
║ │ │ if Jaccard > zoom_jaccard_threshold: │ │ ║
║ │ │ → _declare_needle_at_best() ← force-declare │ │ ║
║ │ │ (if no unpenalized pts → activation_failed) │ │ ║
║ │ │ else: append to zoom_bounds_history; zoom in │ │ ║
║ │ │ current_zoom += 1 ← advance only on clean success │ │ ║
║ │ │ │ │ ║
║ │ └─────────────────────────────────────────────────────────── ┘ │ ║
║ │ │ ║
║ │ _relabel_pared_with_medians() ← smooth pared Y after activation│ ║
║ │ activation++ → next activation │ ║
║ └─────────────────────────────────────────────────────────────────┘ ║
║ ║
║ return (needle_results, needle_locations, needle_vals, X_all, Y_all) ║
╚══════════════════════════════════════════════════════════════════════════╝
Data flow between key objects:
ZoMBIHop
│
├─ DataHandler ── owns ALL persistent state
│ ├── X_all_actual, Y_all (raw observations)
│ ├── X_pared, Y_pared (deduped GP training set,
│ │ relabeled with medians
│ │ after each activation)
│ ├── needles, needle_M_list, needle_B (discovered optima + ellipsoids)
│ ├── needles_results (value + median_value per needle)
│ ├── _penalty_mask (True = not penalized)
│ ├── bounds, current_zoom_bounds (trust region)
│ └── bounds_history (recent AABB tensors for Jaccard)
│
├─ GPSimplex ── wraps BoTorch GP
│ ├── fit(X, Y) ILR + std-normalize → SingleTaskGP
│ ├── fit_with_locality() local-only GP around a needle
│ ├── create_acquisition() RepulsiveAcquisition (with needle penalties)
│ ├── create_clean_acquisition() penalty-free acquisition for Hessian
│ ├── recompute_all_ellipsoids() bulk ellipsoid refit (one local GP per needle)
│ ├── get_candidate() raw_samples → nat-grad ascent → best point
│ └── determine_penalty_ellipsoid(acq_fn=...) Hessian M in ILR space
│
└─ LineBO ── evaluates the physical objective
├── ranked_line_endpoints() → score chords by mean acquisition
└── sampler() → call objective_function with ranked endpoints
| Parameter | Default | Controls |
|---|---|---|
max_zooms |
3 | Number of zoom levels per activation. More zooms = finer trust regions but more iterations before declaring a needle. Also used as the Jaccard history window size. |
max_iterations |
10 | Max BO iterations per zoom level. Total per activation ≤ max_zooms × max_iterations. |
n_consecutive_converged |
2 | How many successive converged iterations trigger needle declaration. Higher = more conservative (requires sustained convergence). |
top_m_points |
auto (d+1) | How many top-Y unpenalized points define the AABB zoom bounds. Small = tight zoom; large = wide. |
| Parameter | Default | Controls |
|---|---|---|
convergence_pi_threshold |
0.01 | PI below this value satisfies the first convergence condition. PI is computed against the local best within the current zoom bounds (not global), so this threshold is meaningful relative to the active search region. Lower = harder to converge. |
input_noise_threshold_mult |
2.0 | Unused directly in convergence check (kept for backward compat); related to input noise scale. |
output_noise_threshold_mult |
2.0 | Y improvement (over the local prev_best within bounds) must be < this × GP output noise. Higher = easier to converge. |
| Parameter | Default | Controls |
|---|---|---|
max_gp_points |
3000 | Cap on GP training set size. When zoomed, this budget is filled first by pared points within bounds, then by raw unpenalized points within bounds. Prevents O(n³) GP cost blowing up. |
acquisition_type |
"ucb" | Base acquisition: "ucb" (Upper Confidence Bound) or "ei" (Log Expected Improvement). |
ucb_beta |
0.1 | Exploration weight for UCB: α(x) = μ + β½σ. Higher = more exploration. |
repulsion_lambda |
None (auto) | Strength of ellipsoid repulsion penalty. None = auto-computed as max(10×median|acq|, 100) each call. |
n_restarts |
30 | Number of nat-grad restarts for acquisition optimization (get_candidate). |
raw |
500 | Initial random samples before selecting restart seeds. |
nat_grad_step |
0.02 | Step size α for natural-gradient ascent: x ← normalize(x ⊙ exp(α(g − ḡ))). |
nat_grad_max_steps |
50 | Max gradient steps per restart. |
| Parameter | Default | Controls |
|---|---|---|
max_penalty_radius |
1.0 | Caps the maximum ellipsoid semi-axis length (ILR units). Prevents a single needle from excluding too much of the simplex. |
ellipsoid_drop_fraction |
0.25 | Fraction of peak acquisition value that defines the ellipsoid boundary. Larger = bigger exclusion zone. |
ellipsoid_eigenvalue_floor |
1e-6 | Minimum eigenvalue of –H before computing M. Prevents degenerate flat-Hessian ellipsoids. |
| Parameter | Default | Controls |
|---|---|---|
paring_spatial_halfnoise |
0.5 | Spatial duplicate threshold in ILR units: points within factor × input_noise_ilr are candidates for deduplication. Also controls the neighbourhood radius for median relabeling of pared Y values and the needle median value computation. |
paring_y_noise_multiplier |
1.0 | Y-value duplicate threshold: factor × GP_output_noise. |
input_noise_ilr |
0.03 | Known physical input noise σ in ILR space (instrument precision). Used for paring, median neighbourhood radius, needle shrink stop condition, and old-needle radius. |
| Parameter | Default | Controls |
|---|---|---|
needle_shrink_factor |
0.85 | Per-demotion-retry multiplicative shrink on all ellipsoid radii. M ← M / f² tightens the exclusion zone. |
needle_stop_noise_multiplier |
3.0 | Stop shrinking when max ellipsoid semi-axis < this × input_noise_ilr. Prevents shrinking below measurement resolution. |
| Parameter | Default | Controls |
|---|---|---|
jaccard_window |
3 | Number of recent bounding boxes to compare against when selecting new zoom bounds. determine_new_bounds slides over windows of top-M points to find a box with Jaccard ≤ jaccard_threshold vs all recent boxes in this window. |
jaccard_threshold |
0.9 | Maximum Jaccard overlap allowed between a candidate bounding box and any box in the recent history window. A higher value accepts more overlap before trying the next window. |
The secondary MC Jaccard guard (_bounds_jaccard_simplex, threshold 0.75) is still used at zoom transitions as a force-declare trigger when the sliding-window bounds selection nonetheless produces a box that overlaps with history.
| Parameter | Default | Controls |
|---|---|---|
bounds_shrink_factor |
0.8 | Radius shrink factor applied to all needle ellipsoids on a persistent-failure (no new data) retry. M ← M / factor². |
min_axis_noise_mult |
2.0 | Stop condition for shrinking: if max_needle_radius() < min_axis_noise_mult × input_noise_ilr, all ellipsoids are at noise scale and the run terminates. |
Role: Low-level geometry on the probability simplex. All other modules import from here.
Named triple (c, M, B):
c: (d,) centre on ΔM: (d−1, d−1) SPD precision matrix — a point x is inside when u⊤M u ≤ 1, u = B⊤(x−c)B: (d, d−1) orthonormal tangent-space columns (zero-sum hyperplane basis)
Forward ILR (Helmert contrasts):
For i = 0, …, d−2:
Maps compositions (d,) → ILR coordinates (d−1,). ILR is an isometry: Euclidean distances in ILR space equal Aitchison distances in simplex space.
Why it matters: The GP is trained in ILR space, where the simplex constraint is lifted to all of R^{d−1} and the geometry is Euclidean. Kernel lengthscales have consistent meaning across the simplex (unlike raw compositions, where corners are geometrically compressed).
Samples uniformly from the bounded probability simplex using Conditional Frechet Sampling (inclusion-exclusion polytope volume + Newton CDF inversion). Complexity O(2^d); maximum d = 20.
Projects arbitrary R^d points onto the probability simplex (Duchi et al., 2008). Differentiable — safe to use inside autograd.
Role: Configuration dataclasses. ZoMBIHopConfig is the single source of truth for all hyperparameters; Checkpoint is a legacy metadata struct for backward compatibility.
All hyperparameters with defaults (see §4). Key methods: to_dict() / from_dict(data) (JSON serialization, unknown keys silently dropped), get_torch_dtype(), __post_init__() (validates all fields on construction).
Role: The single owner of all mutable algorithm state. Every tensor, counter, and needle record lives here. ZoMBIHop and GPSimplex both hold a reference and call these methods.
Key methods:
| Method | Description |
|---|---|
save_init(...) |
Populates tensors from initial data, builds pared set, writes config.json, takes permanent "init" snapshot. |
take_snapshot(...) |
Saves complete state to run_dir/snapshots/{seq}_{label}/: tensors.pt, needles.json, summary.json. |
add_all_points(...) |
Appends new observations, updates pared set and penalty mask. |
add_needle(...) |
Records needle with peak value, median value, ellipsoid M. |
get_gp_data() |
Returns global GP training set (pared → raw unpenalized → init fallback). |
get_zoom_gp_data(bounds) |
Local variant restricted to bounds (pared within bounds first, then raw fill). |
get_best_in_bounds(bounds) |
Best unpenalized point within axis-aligned bounds. |
determine_new_bounds(...) |
Jaccard sliding-window AABB selection. |
_relabel_pared_with_medians() |
Post-activation smoothing: replaces each pared Y with local median. |
shrink_all_needle_radii(factor) |
Multiplies each needle M by 1/factor² (failure-retry Case 3). |
update_all_needle_radii(new_M_list) |
Replaces needle M matrices (failure-retry Case 1). |
max_needle_radius() |
Largest semi-axis across all needle ellipsoids. |
Role: Wraps a BoTorch SingleTaskGP with ILR normalization and provides acquisition creation, candidate generation, and Hessian ellipsoid computation.
Takes simplex inputs, transforms to ILR, optionally divides by ilr_std, then calls the underlying BoTorch acquisition. Necessary because BoTorch acquisitions expect unconstrained R^{d−1} inputs.
Combines base acquisition with smooth needle repulsion:
Zero outside the ellipsoid, rising quadratically toward the centre (soft boundary). λ auto-computed as max(10 × median|α|, 100) when not specified.
- X → ILR; compute per-dimension std; std-normalize.
- Fit BoTorch
SingleTaskGP+fit_gpytorch_mll. - Update DataHandler's stored output noise.
Why std-normalize ILR: ILR dimensions may have very different dynamic ranges, causing poorly-calibrated kernel lengthscales. Normalizing by observed std puts all dimensions on equal footing.
create_acquisition(best_f)→ RepulsiveAcquisition- Sample
raw_samplesrandom points, score by acquisition - Filter by penalty mask; sort; take top
n_restarts - Run natural-gradient ascent on seeds
- Final penalty check on winner → return or None
Natural-gradient ascent: x ← normalize(x ⊙ exp(α(g − ḡ))), the exponential map on the probability simplex respecting the Fisher information metric.
Computes Hessian of acquisition at needle in ILR space via torch.autograd.functional.hessian. Derives M = neg_H / (2Δ) where Δ is the acquisition drop criterion. Semi-axes = 1/√(eigenvalues of M). Always called with create_clean_acquisition() (no repulsion) during needle declaration.
Role: Physical experiment interface. Proposes a line to evaluate, calls the user-supplied objective, returns observations.
Scores each chord by mean acquisition over N uniformly-spaced points (not max). Mean integrates penalties along the full chord — a line spending most of its length inside a penalized region scores very negatively.
ranked_line_endpoints→ sorted (x_left, x_right)- Call
objective_function(endpoints_ranked) - Build
x_requestedfrom first principal direction ofx_actualvia SVD
Role: Top-level orchestrator. Owns the activation/zoom/iteration loops, needle declaration, failure handling, convergence checking, and Jaccard-based repeated-zoom detection.
Convergence requires both:
- PI(candidate) <
convergence_pi_threshold - (current batch best Y) − (prev_best_Y) < GP_output_noise ×
output_noise_threshold_mult
best_f_ref is Y.max() from get_zoom_gp_data(bounds) — PI is measured against the best observation within current bounds, not globally.
- Get current best unpenalized point
- Compute needle median value (median Y within
paring_spatial_halfnoise × input_noise_ilr) fit_with_locality→ clean local GPcreate_clean_acquisition→ penalty-free acquisitiondetermine_penalty_ellipsoid→ Mdh.add_needle(..., needle_median_value=median)- Reset bounds to full simplex
Called on PI convergence (reason="PI convergence") or Jaccard force-declare (reason="Jaccard convergence").
Role: Plotting utilities for ternary (d=3) and tetrahedral (d=4) problems plus progress tracking.
plot_optimization_progress— three-panel: best Y, needle count, total points over iterationsplot_simplex_2d— ternary diagram with colormap scatter + needle starsplot_simplex_3d— tetrahedral 3D plot projecting 4-simplex coordinates to 3Dplot_needles_summary— bar chart of needle values + discovery timeline
The ILR transform is an isometry from (Δ^d, d_A) to (R^{d−1}, ‖·‖₂) where d_A is the Aitchison distance:
The SE kernel in ILR space is therefore an Aitchison-stationary kernel on the simplex — exactly the right prior for composition-valued data.
The Fisher information metric on the probability simplex at point x is g_{ij}(x) = δ_{ij}/x_i. The exponential map gives the multiplicative update:
where
At the needle z* = ILR(x*), Taylor expansion of the acquisition gives:
Since z* is a maximum, H is negative semi-definite. Define neg_H = −H. The iso-acquisition surface where α drops by Δ from the peak is:
Semi-axes:
When zoomed into bounds B ⊂ Δ, PI is:
where f* = best_f_local = max Y from get_zoom_gp_data(B). Two locality effects: σ(x) is smaller within B (tighter posterior from local data), and f* is the local best (PI not trivially near-zero when a higher value exists in another region).
| Aspect | Benefit |
|---|---|
| Multi-modal discovery | Guaranteed-different optima via hard ellipsoidal exclusion zones — each activation finds a new basin. |
| Simplex fidelity | ILR + CFS + natural gradient ascent all respect the simplex geometry exactly. |
| Noise-robustness | Point paring prevents noise spikes from poisoning the GP. Pared Y values are relabeled with local medians after each activation. Needle records store both peak and median values. |
| Adaptive exclusion zones | Hessian ellipsoids adapt to local curvature — narrow along flat directions, tight along steep ones. |
| Failure recovery | Three-way in-place failure dispatch (clean ellipsoid refit → Jaccard-aware bounds update → radius shrink) handles both transient and persistent failures without restarting the zoom sequence. |
| Reproducibility | Full state snapshots at every iteration enable exact restart. |
| Local GP for zoom | Fitting only within-bounds data makes the posterior tight in the active region, causing PI to drop quickly on genuine convergence. |
| Clean local GP for ellipsoids | Needle ellipsoids are computed from a local, penalty-free GP fit, giving curvature estimates that reflect only the local landscape. |
| Jaccard sliding window | determine_new_bounds slides over Y-ranked windows to find a genuinely different AABB before the MC Jaccard guard runs. |
| Aspect | Drawback |
|---|---|
| GP cost | O(n³) GP fitting. Mitigated by max_gp_points truncation and local data restriction. |
| CFS dimension limit | random_simplex requires d ≤ 20 (2^d inclusion-exclusion). |
| Hessian accuracy | Hessian computed at one point; flat/curved acquisition tops may produce poorly-sized ellipsoids. |
| AABB zoom bounds | Axis-aligned bounding box ignores simplex geometry — can straddle inactive corners. |
| Natural gradient pathology | Multiplicative update can get stuck at simplex vertices (x_i = 0 freezes that coordinate). |
| Jaccard is approximate | Monte Carlo Jaccard (n=500) has sampling noise; the 0.75 threshold is a heuristic. |
Over-clustering at one corner: Fixed by mean line scoring (integrates penalties along the full chord rather than taking max).
Pared set contamination: _update_penalty_mask() purges the pared set after every needle addition/refit.
Zoom loop stagnation: Fixed by (1) determine_new_bounds sliding window to find dissimilar AABB, (2) MC Jaccard guard force-declares needle rather than triggering failure, (3) Edit 2B failure retry at same zoom level.
Spurious PI convergence: Fixed by local GP + local best_f_ref + local prev_best. PI is computed against the best observation within current zoom bounds.
Problem: Needle ellipsoid Hessian was computed from a GP trained on local (zoom-restricted) data but through RepulsiveAcquisition, which distorts curvature near other needles.
Fix: fit_with_locality + create_clean_acquisition (no repulsion penalties) are now used during needle declaration, so M reflects only true local curvature. recompute_all_ellipsoids applies this to all needles on failure-retry Case 1.
Problem: determine_new_bounds always returned the AABB of the top-M points by Y, producing nearly-identical boxes on successive zooms and triggering immediate force-declare.
Fix: Sliding-window search over Y-ranked unpenalized points finds the first AABB with Jaccard ≤ jaccard_threshold vs recent bounds_history. Falls back to the least-similar window if none qualifies.
New parameters: jaccard_window (default 3), jaccard_threshold (default 0.9).
_jaccard_box(a, b): volume-Jaccard of two axis-aligned boxes; zero-width dimensions in both boxes are projected out to avoid division by zero.
Problem: The old outer while True demotion-retry loop rewound to zoom 0 on failure, restarting the full zoom sequence — expensive and prone to infinite loops.
Fix: Zoom loop restructured to while current_zoom < max_zooms. On failure, _handle_failure_retry dispatches in-place and issues continue to retry the same zoom level.
Case 1 (first failure):
recompute_all_ellipsoids → update_all_needle_radii → retry same zoom
Case 2 (repeated failure, new data since last failure):
determine_new_bounds(add_to_history=False) → retry same zoom
Case 3 (repeated failure, no new data):
shrink_all_needle_radii(bounds_shrink_factor)
if max_radius < min_axis_noise_mult × σ_ILR → stop
else → retry same zoom
current_zoom is incremented only on clean success (no failure, no force-declare).
New parameters: bounds_shrink_factor (default 0.8), min_axis_noise_mult (default 2.0).
