diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..f7725dc --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,32 @@ +# gpCAM Claude Skills + +You are helping scientists design autonomous experiments using gpCAM, a Gaussian Process-based Bayesian optimization toolkit. + +## Skills + +Read the appropriate skill file before generating code: + +- **Designing an experiment**: `skills/experiment-designer/SKILL.md` +- **Custom kernels**: `skills/kernel-designer/SKILL.md` +- **Acquisition functions**: `skills/acquisition-functions/SKILL.md` +- **Prior mean functions**: `skills/prior-mean-functions/SKILL.md` +- **Noise models**: `skills/noise-functions/SKILL.md` +- **Cost functions (travel time)**: `skills/cost-functions/SKILL.md` +- **Large-scale (>10k points)**: `skills/gp2scale-advanced/SKILL.md` +- **Multi-task/multi-output**: `skills/multi-task-advanced/SKILL.md` + +## Reference Materials + +- [gpCAM documentation](https://gpcam.readthedocs.io) — full API reference and mathematical background +- `gpcam/kernels.py` — re-exports the fvGP kernel library (`from fvgp.kernels import *`) + +## Key Principles + +1. **Generate complete, runnable scripts** — not fragments +2. **Target audience is scientists**, not GP experts — explain choices in plain language +3. **Always document the hyperparameter layout** — which index maps to what +4. **Hyperparameter bounds must match** the total hyperparameter count across kernel + mean + noise +5. **Default kernel is usually fine** — only suggest custom when there's a clear reason +6. **Use vectorized numpy** — no Python loops over data points +7. **Return dict keys**: `posterior_mean()` returns `"m(x)"`, `posterior_covariance()` returns `"v(x)"` and `"S"`. NOT `"f(x)"`. +8. **`get_data()` keys use spaces**: `"x data"`, `"y data"`, `"hyperparameters"`, `"measurement variances"`. NOT underscores. diff --git a/README.md b/README.md index ecb335c..efa95a5 100644 --- a/README.md +++ b/README.md @@ -38,6 +38,26 @@ for i in range(100): ``` +## Designing experiments with Claude Code + +gpCAM ships with a set of [Claude Code](https://docs.anthropic.com/en/docs/claude-code) skills that guide an AI assistant through designing autonomous experiments — custom kernels, acquisition functions, noise models, and the full ask/tell/train loop. Experimentalists who want smart, autonomous data acquisition without deep knowledge of GP math or the gpCAM API can use these skills to design autonomous experiments. + +When you clone this repo, the root `CLAUDE.md` and `skills/` directory are picked up automatically by Claude Code. Available skills: + +| Skill | Description | +|-------|-------------| +| **experiment-designer** | End-to-end autonomous experiment design. Translates a scientist's description of their measurement into a complete gpCAM script. | +| **kernel-designer** | Design and compose custom kernel functions that encode domain knowledge (smoothness, periodicity, symmetry, anisotropy). | +| **acquisition-functions** | Write custom acquisition functions that encode experimental priorities (exploration vs exploitation, multi-objective, constraints). | +| **prior-mean-functions** | Encode known physics or expected trends as prior mean functions. | +| **noise-functions** | Model position-dependent or heteroscedastic noise from detector characteristics. | +| **cost-functions** | Account for motor travel time, settling, directional costs, and zone-based penalties. | +| **gp2scale-advanced** | Large-scale experiments (>10k points) using sparse kernels and Dask distributed computing. | +| **multi-task-advanced** | Multi-output / function-valued experiments with `fvGPOptimizer`. | + +These skills are also compatible with other agentic platforms (e.g. [OpenClaw](https://openclaw.ai), or any harness that can read `SKILL.md` files) — point your assistant at the `skills/` directory. + + ## Credits Main Developer: Marcus Noack ([MarcusNoack@lbl.gov](mailto:MarcusNoack@lbl.gov)) diff --git a/skills/acquisition-functions/SKILL.md b/skills/acquisition-functions/SKILL.md new file mode 100644 index 0000000..af2d031 --- /dev/null +++ b/skills/acquisition-functions/SKILL.md @@ -0,0 +1,205 @@ +# Skill: gpCAM Acquisition Functions + +Design custom acquisition functions that control where gpCAM measures next. + +## When to Use + +When a user needs acquisition behavior beyond the built-in options: +- Balancing exploration and exploitation +- Multi-objective optimization +- Constrained search (avoid forbidden regions) +- Cost-aware acquisition (expensive moves) +- Upper/lower confidence bounds +- Probability of improvement + +## Acquisition Function Contract + +```python +def my_acquisition(x, gp_optimizer): + """ + Parameters + ---------- + x : np.ndarray, shape (V, D) + Candidate points to evaluate. + gp_optimizer : GPOptimizer + The GP model. Use its posterior_mean() and posterior_covariance() methods. + + Returns + ------- + scores : np.ndarray, shape (V,) + Score for each candidate. HIGHER = more desirable (function is MAXIMIZED). + """ +``` + +**Key rule:** The acquisition function is **maximized**. Return higher values for points you want to measure. + +## Built-in Options + +Pass these as strings to `gpo.ask(acquisition_function=...)`: + +| Name | String key | Best for | +|------|-----------|----------| +| Variance | `"variance"` | Pure exploration / mapping | +| Expected Improvement | `"expected improvement"` | Optimization (find max) | +| Probability of Improvement | `"probability of improvement"` | Optimization, risk-averse | +| Upper Confidence Bound | `"ucb"` | Maximization with tunable exploration/exploitation | +| Lower Confidence Bound | `"lcb"` | Minimization with tunable exploration/exploitation | +| Predicted Maximum | `"maximum"` | Pure exploitation — mean only, no uncertainty | +| Predicted Minimum | `"minimum"` | Pure exploitation for minimization | +| Gradient | `"gradient"` | Seek steepest regions of the posterior mean | +| Target Probability | `"target probability"` | Find points with output near a target value | +| Relative Information Entropy | `"relative information entropy"` | Information-theoretic exploration | +| RIE Set | `"relative information entropy set"` | Batch acquisition | +| Total Correlation | `"total correlation"` | Batch acquisition | + +To sanity-check any built-in or custom acquisition on a grid of candidates without calling `ask()`: +```python +scores = gpo.evaluate_acquisition_function(x_grid, acquisition_function="ucb") +``` + +## Custom Acquisition Recipes + +### Upper Confidence Bound (UCB) +Available as the built-in string `"ucb"` — pass directly to `gpo.ask(acquisition_function="ucb")`. Write the callable form below only when you need to tune `beta` or otherwise customize the score: +```python +def ucb(x, gpo): + """ + beta controls exploration/exploitation tradeoff: + beta=0: pure exploitation (just go to predicted max) + beta=1: mild exploration + beta=3: strong exploration (~95% confidence) + """ + beta = 2.0 # TUNE THIS + mean = gpo.posterior_mean(x)["m(x)"] + var = gpo.posterior_covariance(x, variance_only=True)["v(x)"] + return mean + beta * np.sqrt(var) +``` + +### Lower Confidence Bound (for minimization) +gpCAM maximizes acquisition, so flip the sign for minimization: +```python +def lcb(x, gpo): + """Find the minimum of the function.""" + beta = 2.0 + mean = gpo.posterior_mean(x)["m(x)"] + var = gpo.posterior_covariance(x, variance_only=True)["v(x)"] + return -(mean - beta * np.sqrt(var)) # note the negation +``` + +### Expected Improvement (custom version with minimization) +```python +from scipy.stats import norm + +def expected_improvement_minimize(x, gpo): + """Expected improvement for finding the MINIMUM.""" + mean = gpo.posterior_mean(x)["m(x)"] + var = gpo.posterior_covariance(x, variance_only=True)["v(x)"] + std = np.sqrt(np.maximum(var, 1e-10)) + + y_best = np.min(gpo.y_data) # current best (minimum) + z = (y_best - mean) / std + ei = std * (z * norm.cdf(z) + norm.pdf(z)) + return ei +``` + +### Probability of Improvement +```python +from scipy.stats import norm + +def probability_of_improvement(x, gpo): + """Probability that measurement improves on current best.""" + mean = gpo.posterior_mean(x)["m(x)"] + var = gpo.posterior_covariance(x, variance_only=True)["v(x)"] + std = np.sqrt(np.maximum(var, 1e-10)) + + y_best = np.max(gpo.y_data) + z = (mean - y_best) / std + return norm.cdf(z) +``` + +### Constrained Acquisition (Avoid Regions) +```python +def constrained_variance(x, gpo): + """Explore but avoid a circular forbidden zone.""" + var = gpo.posterior_covariance(x, variance_only=True)["v(x)"] + + # Forbidden zone: circle at (5, 5) with radius 1 + center = np.array([5.0, 5.0]) + dist = np.linalg.norm(x - center, axis=1) + penalty = np.where(dist < 1.0, -1e6, 0.0) + + return var + penalty +``` + +### Multi-Objective (Weighted) +```python +def multi_objective(x, gpo): + """Balance finding the max with reducing uncertainty.""" + mean = gpo.posterior_mean(x)["m(x)"] + var = gpo.posterior_covariance(x, variance_only=True)["v(x)"] + + w_exploit = 0.7 # weight on exploitation + w_explore = 0.3 # weight on exploration + + # Normalize each component to [0, 1] + mean_norm = (mean - mean.min()) / (mean.max() - mean.min() + 1e-10) + var_norm = (var - var.min()) / (var.max() - var.min() + 1e-10) + + return w_exploit * mean_norm + w_explore * var_norm +``` + +### Threshold Finder (Find Boundary) +Useful when searching for where a signal crosses a threshold: +```python +def threshold_finder(x, gpo): + """Find the boundary where f(x) = threshold.""" + threshold = 0.5 # EDIT THIS + + mean = gpo.posterior_mean(x)["m(x)"] + var = gpo.posterior_covariance(x, variance_only=True)["v(x)"] + std = np.sqrt(np.maximum(var, 1e-10)) + + # Score is high near the threshold AND where uncertainty is high + distance_to_threshold = np.abs(mean - threshold) + return std / (distance_to_threshold + 0.01) +``` + +## Usage in the Experiment Loop + +```python +# Built-in string or a callable are both accepted: +result = gpo.ask( + input_set=parameter_bounds, + acquisition_function=ucb, # or "ucb", "expected improvement", ... +) +``` + +### Useful `ask()` options + +| Argument | Meaning | +|----------|---------| +| `n=N` | Request `N` points at once (batch). For vectorized single-task, use a batch-aware acquisition like `"relative information entropy set"` or `"total correlation"`. | +| `vectorized=True` (default) | The acquisition function is called once with all candidate points, shape `(V, D)` — required for custom callables written against the contract above. | +| `vectorized=False` | Candidates are evaluated one at a time (list of 1-D arrays). Used for non-vectorizable acquisition or non-Euclidean inputs. | +| `method="global"\|"local"\|"hgdl"\|"hgdlAsync"` | Inner optimizer that searches for the argmax of the acquisition over `input_set`. `hgdl` requires `dask_client=`; `hgdlAsync` starts a background search and returns an `opt_obj` you can poll or `kill_client()`. | +| `dask_client=client` | Distribute the inner optimization across Dask workers. | +| `batch_size=B` | When candidates are a list, evaluate them in chunks of `B` on the cluster. | +| `max_iter`, `pop_size`, `info=True` | Inner optimizer controls. | + +`input_set` can be continuous bounds (`np.array([[lo,hi], ...])`), a list of candidate points (discrete finite set), or a list of arbitrary objects (non-Euclidean — strings, graphs — provided your kernel handles them). + +## Hyperparameter Coordination + +Acquisition functions don't add hyperparameters — they read the GP state via `gpo.posterior_mean()` and `gpo.posterior_covariance()`. However: + +- If you access `gpo.y_data` directly (e.g., for `y_best`), make sure it's up to date after `tell()` +- The GP must be trained before acquisition makes sense — always call `train()` first +- For `variance_only=True`: faster, returns just diagonal variances (usually what you want) +- For full covariance: use `variance_only=False` but this is O(V²) memory + +## Common Pitfalls + +1. **Returning negative scores for points you want**: Remember, acquisition is MAXIMIZED. +2. **Division by zero in std**: Always use `np.maximum(var, 1e-10)` before taking sqrt. +3. **Not handling edge cases**: Early in the loop with few points, the GP posterior can be unreliable. +4. **Expensive acquisition functions**: They're evaluated many times during optimization. Keep them fast. diff --git a/skills/cost-functions/SKILL.md b/skills/cost-functions/SKILL.md new file mode 100644 index 0000000..c40bc0d --- /dev/null +++ b/skills/cost-functions/SKILL.md @@ -0,0 +1,195 @@ +# Skill: gpCAM Cost Functions + +Design cost functions that account for the real expense of moving between measurement points — motor travel time, sample damage, beam time, etc. + +## When to Use + +- Moving motors is slow and travel time matters (long-range stage moves) +- Some regions of the parameter space are more expensive to reach +- You want to avoid unnecessary large jumps between measurements +- Cost varies by direction (e.g., moving up is faster than moving down) +- Sample damage accumulates with exposure and should be minimized + +## How Cost Functions Work in gpCAM + +The cost function modifies the acquisition score: + +``` +effective_score(x) = acquisition_score(x) / cost(origin, x) +``` + +Points that are expensive to reach get penalized. The optimizer still picks high-value points, but prefers nearby high-value points over distant ones. + +## Cost Function Contract + +```python +def my_cost(origin, x, arguments=None): + """ + Parameters + ---------- + origin : np.ndarray, shape (V, D) + The current position (where we are now). + x : np.ndarray, shape (V, D) + The candidate destination positions. + arguments : dict or None + Optional extra parameters (passed via GPOptimizer constructor). + + Returns + ------- + cost : np.ndarray, shape (V,) + Cost of moving from origin to each point in x. + Must be > 0. Higher cost = less desirable to visit. + """ +``` + +**Key rules:** +- Cost must be **positive** (> 0) — zero cost causes division by zero +- Cost is used as a **divisor** — higher cost = lower effective acquisition score +- The function receives batches of points, not single points + +## Recipes + +### L2 (Euclidean) Distance Cost +Simple travel time proportional to straight-line distance: +```python +def l2_cost(origin, x, arguments=None): + """Cost proportional to Euclidean distance.""" + offset = 1.0 # minimum cost (prevents div-by-zero, represents measurement time) + speed = 1.0 # cost per unit distance + distance = np.linalg.norm(x - origin, axis=1) + return offset + speed * distance +``` + +### L1 (Manhattan) Distance Cost +For stage systems that move one axis at a time: +```python +def l1_cost(origin, x, arguments=None): + """Cost proportional to Manhattan distance (axis-by-axis motion).""" + offset = 1.0 + speed = 1.0 + distance = np.sum(np.abs(x - origin), axis=1) + return offset + speed * distance +``` + +### Anisotropic Cost +Different axes have different speeds: +```python +def anisotropic_cost(origin, x, arguments=None): + """ + Different cost per axis. + arguments["speeds"] = [speed_dim0, speed_dim1, ...] + """ + offset = 1.0 + speeds = arguments.get("speeds", np.ones(x.shape[1])) + weighted_dist = np.sum(np.abs(x - origin) * speeds, axis=1) + return offset + weighted_dist +``` + +### Directional Cost +Moving in one direction is cheaper (e.g., gravity-assisted, or always-increasing scans): +```python +def directional_cost(origin, x, arguments=None): + """Cheaper to move in +x direction than -x.""" + offset = 1.0 + diff = x - origin + # Forward motion (positive) is cheap, backward is expensive + forward_cost = np.maximum(diff[:, 0], 0) * 1.0 # cost going forward + backward_cost = np.maximum(-diff[:, 0], 0) * 5.0 # 5x cost going backward + lateral_cost = np.sum(np.abs(diff[:, 1:]), axis=1) * 1.0 + return offset + forward_cost + backward_cost + lateral_cost +``` + +### Settling Time Cost +Fast moves need more settling time: +```python +def settling_cost(origin, x, arguments=None): + """ + Short moves are fast; long moves need extra settling time. + Models: cost = base + travel + settling * (distance > threshold) + """ + base = 1.0 + travel_rate = 0.5 + settle_time = 3.0 + settle_threshold = 2.0 # distance above which settling kicks in + + distance = np.linalg.norm(x - origin, axis=1) + settling = np.where(distance > settle_threshold, settle_time, 0.0) + return base + travel_rate * distance + settling +``` + +### Zone-Based Cost +Some regions of the parameter space are more expensive: +```python +def zone_cost(origin, x, arguments=None): + """ + Higher cost to measure in certain zones. + E.g., cryogenic sample region requires cooldown. + """ + base = 1.0 + distance = np.linalg.norm(x - origin, axis=1) + + # Expensive zone: x[:, 0] > 8.0 + zone_penalty = np.where(x[:, 0] > 8.0, 10.0, 0.0) + + return base + distance + zone_penalty +``` + +## Usage + +```python +gpo = GPOptimizer( + x_data=x_data, + y_data=y_data, + cost_function=l2_cost, + # args={"speeds": np.array([1.0, 2.0])} # for anisotropic_cost +) +``` + +The cost function is automatically applied during `ask()` — no extra code needed in the experiment loop. + +## Parameterized Cost with `arguments` + +Pass parameters via the `args` dict at construction: + +```python +gpo = GPOptimizer( + x_data=x_data, + y_data=y_data, + cost_function=anisotropic_cost, + args={"speeds": np.array([1.0, 3.0, 0.5])}, +) +``` + +The `arguments` parameter in the cost function receives this dict. + +## Cost Functions Do NOT Add Hyperparameters + +Unlike kernel/mean/noise functions, cost functions have **fixed parameters** — they are not optimized during training. If you need to tune cost parameters, do it manually or via the `arguments` dict. + +## Calibrating a Cost Function From Observed Moves + +If you don't know the cost parameters ahead of time, record observed costs during the experiment and fit `offset` / `slope` by nonlinear least squares. gpCAM ships templates (`_update_cost_function`, `update_l1_cost_function`, `update_l2_cost_function`) that use SciPy's `differential_evolution`: + +```python +# Collect observations as a list of dicts during the experiment: +observations = [ + {"origin": np.array([0.0, 0.0]), "point": np.array([0.5, 0.2]), "cost": 2.7}, + {"origin": np.array([0.5, 0.2]), "point": np.array([0.8, 0.9]), "cost": 3.4}, + # ... +] + +# Then fit: +from gpcam.cost_functions import update_l2_cost_function # or update_l1_cost_function +new_args = update_l2_cost_function(observations, parameters={"offset": 1.0, "slope": 1.0}) +# Plug `new_args` back into the cost-function `arguments` dict. +``` + +The updater drops outliers beyond ±2σ of the per-move cost-per-distance before fitting, so a few anomalous moves (e.g. instrument glitches) won't distort the cost model. + +## Common Pitfalls + +1. **Zero cost**: Causes division by zero in acquisition. Always add a positive offset (minimum cost ≥ 1.0). +2. **Cost too high**: If cost dominates, the optimizer just measures nearby points and never explores. Balance cost magnitude with acquisition scores. +3. **Wrong shape**: Must return 1D array of length `V`, matching the number of candidate points. +4. **Forgetting `origin`**: The cost depends on where you currently are, not just where you're going. +5. **Not vectorized**: `origin` and `x` are batches — use numpy operations, not loops. diff --git a/skills/experiment-designer/SKILL.md b/skills/experiment-designer/SKILL.md new file mode 100644 index 0000000..b486415 --- /dev/null +++ b/skills/experiment-designer/SKILL.md @@ -0,0 +1,277 @@ +# Skill: gpCAM Experiment Designer + +Design complete autonomous experiment scripts using gpCAM. You translate a scientist's description of their measurement into a runnable Python script. + +## When to Use + +When a user wants to: +- Set up an autonomous/smart scan or optimization +- Replace a raster scan with adaptive sampling +- Find optimal experimental conditions (peak finding, parameter optimization) +- Explore a parameter space efficiently + +## Your Role + +You are helping **beamline scientists** who may not know GP math or the gpCAM API. Your job is to: +1. Understand their experiment (what they measure, what they control, what they want to find) +2. Generate a complete, well-commented Python script they can adapt +3. Explain the key choices you made in plain language + +## Conversation Flow + +### Step 1: Understand the Experiment +Ask about: +- **Input dimensions**: What parameters do they control? (motor positions, temperature, voltage, etc.) +- **Input bounds**: What range for each parameter? +- **Output**: What do they measure? (intensity, spectrum, image, scalar?) +- **Goal**: Exploration (map everything)? Optimization (find the peak)? Both? +- **Constraints**: Any forbidden regions? Cost of moving between points? +- **Prior knowledge**: Do they know roughly what to expect? (smooth? periodic? sharp features?) +- **Data size**: How many measurements can they afford? (determines if gp2Scale is needed) +- **Noise**: Is the measurement noisy? Does noise vary across the parameter space? + +### Step 2: Design Choices +Based on their answers, decide: + +| Choice | Guidance | +|--------|----------| +| **Kernel** | Default Matérn-3/2 ARD is good for most cases. Use periodic kernel if periodicity is known. Use Matérn-1/2 for rough/discontinuous data, Matérn-5/2 or SE for very smooth. See `kernel-designer` skill for custom kernels. | +| **Acquisition function** | `'variance'` for exploration/mapping. `'expected improvement'` or `'ucb'` for optimization (UCB exposes a tunable exploration/exploitation tradeoff via `beta`). Custom callable for multi-objective or constraints. See `acquisition-functions` skill. | +| **Prior mean** | Zero (default) unless they have a physical model. See `prior-mean-functions` skill. | +| **Noise model** | Use `noise_variances` if noise is known and uniform. Use `noise_function` if noise varies. See `noise-functions` skill. | +| **Training strategy** | `method='global'` for first training, `method='local'` for re-training during the loop. Other options: `"mcmc"` (Bayesian — returns posterior samples over hyperparameters), `"adam"` (stochastic-gradient, fast, works well for high-dimensional hyperparameter vectors like deep kernels), `"hgdl"` (distributed local+global hybrid — needs a `dask_client`). | +| **calc_inv** | `True` if <2000 points, `False` otherwise. | +| **Number of initial points** | Rule of thumb: 5-10× the input dimensionality for initial random sampling. | +| **Validation** | `gpo.rmse(x_test, y_true)` and `gpo.crps(x_test, y_true)` give RMSE and continuous ranked probability score on a held-out grid — call these after training to sanity-check the fit. | + +### Step 3: Generate the Script + +**Two paths:** If the scientist only needs a scalar black-box optimized and has no need to inspect or customize the ask/tell loop, use the one-shot `GPOptimizer.optimize()` shortcut (A). Otherwise use the full template (B). The full template is required when they want custom acquisition, mid-loop re-training with specific schedules, async training, checkpointing, validation plots during the run, or integration with a live instrument. + +#### A. One-shot optimize (simplest) + +```python +import numpy as np +from gpcam import GPOptimizer + +def f(x): + # x is shape (N, D); return (y, noise_variance) with matching shapes + y = np.sin(x[:, 0]) * np.cos(x[:, 1]) + return y, np.full(len(x), 0.01) + +gpo = GPOptimizer() +result = gpo.optimize( + func=f, + search_space=np.array([[0., 1.], [0., 1.]]), + max_iter=50, +) +``` + +`optimize()` handles initial sampling, training schedule, the ask/tell loop, and termination. Use it when the scientist has a simulator or instrument wrapper they can hand gpCAM as a Python function. For fvGP/multi-task, pass `x_out=np.array([...])`. + +#### B. Full template (adaptive loop) + +Output a complete Python script with this structure: + +```python +""" +Autonomous Experiment: [description] +Generated for gpCAM v8.3.x + +Input space: [dimensions and ranges] +Output: [what is measured] +Strategy: [exploration/optimization/hybrid] +""" + +import numpy as np +from gpcam import GPOptimizer + +# ============================================================ +# 1. EXPERIMENT PARAMETERS — EDIT THESE +# ============================================================ +# Define the parameter space bounds +# Each row: [min, max] for one dimension +parameter_bounds = np.array([ + [0.0, 10.0], # motor_x (mm) + [0.0, 5.0], # motor_y (mm) +]) +parameter_names = ["motor_x", "motor_y"] + +N_INITIAL = 10 # Initial random measurements +N_ITERATIONS = 50 # Adaptive measurements +RETRAIN_EVERY = 10 # Re-train hyperparameters every N iterations + +# ============================================================ +# 2. YOUR MEASUREMENT FUNCTION — REPLACE THIS +# ============================================================ +def measure(x): + """ + Replace this with your actual measurement. + + Parameters + ---------- + x : np.ndarray, shape (1, D) + The point to measure. x[0, 0] is motor_x, x[0, 1] is motor_y. + + Returns + ------- + y : float + The measured value (scalar). + noise_variance : float or None + The estimated variance of this measurement, or None if unknown. + """ + # EXAMPLE: replace with your instrument call + y = np.sin(x[0, 0]) * np.cos(x[0, 1]) + noise_variance = 0.01 # or None + return y, noise_variance + +# ============================================================ +# 3. KERNEL (optional customization) +# ============================================================ +# The default ARD Matérn-3/2 kernel is used if kernel_function=None. +# Uncomment and modify to use a custom kernel. +# from gpcam.kernels import matern_kernel_diff1, get_anisotropic_distance_matrix +# +# def my_kernel(x1, x2, hyperparameters): +# d = get_anisotropic_distance_matrix(x1, x2, hyperparameters[1:]) +# return hyperparameters[0] * matern_kernel_diff1(d, 1.0) + +kernel_function = None # None = default ARD Matérn-3/2 + +# ============================================================ +# 4. HYPERPARAMETER BOUNDS +# ============================================================ +# For the default kernel: [signal_variance, length_scale_dim1, ..., length_scale_dimD] +# Rule of thumb: +# signal_variance bounds: [0.01, 10 * std(y)] (estimated after initial data) +# length_scale bounds: [0.01, 10 * range(x_dim)] +D = parameter_bounds.shape[0] +hp_bounds = np.array( + [[0.001, 100.0]] + # signal variance + [[0.01, 10.0 * (b[1] - b[0])] for b in parameter_bounds] # length scales +) + +# ============================================================ +# 5. ACQUISITION FUNCTION +# ============================================================ +acquisition_function = "variance" # Options: "variance", "expected improvement", + # "ucb", "relative information entropy", + # or a callable + +# ============================================================ +# 6. RUN THE EXPERIMENT +# ============================================================ +def run(): + # --- Initial random sampling --- + x_init = np.random.uniform( + parameter_bounds[:, 0], parameter_bounds[:, 1], + size=(N_INITIAL, D) + ) + y_init = np.zeros(N_INITIAL) + noise_init = np.zeros(N_INITIAL) + + for i in range(N_INITIAL): + y_init[i], nv = measure(x_init[i:i+1]) + noise_init[i] = nv if nv is not None else 0.0 + + # --- Initialize GP --- + gpo = GPOptimizer( + x_data=x_init, + y_data=y_init, + noise_variances=noise_init if noise_init.any() else None, + kernel_function=kernel_function, + calc_inv=(N_INITIAL + N_ITERATIONS < 2000), + ) + + # --- Initial training --- + gpo.train(hyperparameter_bounds=hp_bounds, method="global", max_iter=200) + + # --- Adaptive loop --- + for i in range(N_ITERATIONS): + # Ask: where should we measure next? + result = gpo.ask( + input_set=parameter_bounds, + acquisition_function=acquisition_function, + ) + next_x = result["x"] + + # Measure + new_y, new_nv = measure(next_x) + + # Tell: update the GP + gpo.tell( + next_x, + np.array([new_y]), + noise_variances=np.array([new_nv]) if new_nv is not None else None, + ) + + # Re-train periodically + if (i + 1) % RETRAIN_EVERY == 0: + gpo.train(hyperparameter_bounds=hp_bounds, method="local", max_iter=100) + + print(f"Iteration {i+1}/{N_ITERATIONS}: " + f"measured at {next_x[0]} -> {new_y:.4f}") + + # --- Results --- + data = gpo.get_data() + print(f"\nDone! Collected {len(data['x data'])} points total.") + print(f"Final hyperparameters: {data['hyperparameters']}") + + return gpo + +if __name__ == "__main__": + gpo = run() +``` + +## Key Rules + +1. **Always generate a complete, runnable script** — not fragments. Scientists should be able to copy-paste and run it. +2. **The `measure()` function is a placeholder** — clearly mark it and explain what to replace. +3. **Comment heavily** — explain every choice for the non-expert. +4. **Hyperparameter bounds matter** — set sensible defaults based on the parameter ranges and expected signal scale. +5. **Default kernel is usually fine** — only suggest custom kernels when there's a clear reason (known periodicity, symmetry, etc.). +6. **Training schedule** — train globally once at the start, then locally every N iterations. Don't train every iteration (too slow). +7. **Initial points** — always start with random initial sampling before the adaptive loop. + +## Hyperparameter Coordination + +This is critical and often the source of bugs: + +- The hyperparameter vector is shared across kernel, mean, and noise functions +- For the default kernel with D input dimensions: `hps[0]` = signal variance, `hps[1:D+1]` = length scales +- If you add a custom noise function that uses hyperparameters, those come AFTER the kernel hyperparameters +- If you add a prior mean function that uses hyperparameters, document which indices it uses +- **Always document the hyperparameter layout** in a comment at the top of the script +- **Always set bounds for ALL hyperparameters** — the bounds array must match the total hyperparameter count + +Example with custom noise: +```python +# Hyperparameter layout: +# hps[0] = signal variance (kernel) +# hps[1:D+1] = length scales (kernel) +# hps[D+1] = noise amplitude (noise function) +# +# Total: D + 2 hyperparameters + +def my_noise(x, hps): + return np.full(len(x), hps[D+1]**2) + +hp_bounds = np.array( + [[0.001, 100.0]] + # signal variance + [[0.01, 10.0 * (b[1]-b[0])] for b in parameter_bounds] + # length scales + [[0.001, 10.0]] # noise amplitude +) +``` + +## Advanced Options (mention only if needed) + +- **Incremental data updates**: `gpo.tell(x_new, y_new, append=True)` adds points without overwriting; `append=False` replaces. Default is to replace — use `append=True` in a streaming instrument loop. +- **Async training**: `opt_obj = gpo.train(..., asynchronous=True, method="hgdl", dask_client=client)` returns immediately; later call `gpo.update_hyperparameters(opt_obj)` to pull current best hyperparameters, and `gpo.stop_training(opt_obj)` to finish. Useful when training is expensive and the loop shouldn't block. +- **Async ask**: `method="hgdlAsync"` starts a background search for the next point; the result dict contains an `opt_obj` you can `kill_client()` once you've used the suggestion. +- **Checkpointing**: `GPOptimizer` instances are picklable — `pickle.dumps(gpo)` before a long run lets you reload state later. +- **Info measures**: `gpo.gp_mutual_information(x_test)` and `gpo.gp_total_correlation(x_test)` report information content at a candidate set. +- **User-function arguments**: `GPOptimizer(..., args={"a": 1.5, "b": 2.0})` plumbs a dict through to custom kernel/mean/noise/cost functions (they receive it as an extra argument when they declare it). + +## Reference + +For detailed API docs, kernel math, and advanced options, see the [gpCAM documentation](https://gpcam.readthedocs.io). diff --git a/skills/gp2scale-advanced/SKILL.md b/skills/gp2scale-advanced/SKILL.md new file mode 100644 index 0000000..94222dc --- /dev/null +++ b/skills/gp2scale-advanced/SKILL.md @@ -0,0 +1,128 @@ +# Skill: gp2Scale — Large-Scale GPs + +Design experiments with tens of thousands to millions of data points using gpCAM's gp2Scale mode for exact GP computation at scale. + +## When to Use + +- Dataset will exceed ~10,000 points (and can go into the millions with enough cluster) +- Need exact GP inference — gp2Scale is **not** an approximation like Vecchia or inducing-point methods; it only exploits naturally-occurring sparsity induced by compact-support kernels, so the result is exact +- Have access to multiple CPU cores, GPUs, or a compute cluster +- Willing to use compactly-supported kernels (Wendland) + +## Key Concepts + +gp2Scale uses: +1. **Wendland kernels** with compact support → sparse covariance matrix (zero covariance beyond the support radius) +2. **Dask distributed** for parallel covariance computation — one covariance block per worker +3. **Sparse linear algebra** (LU, Cholesky) via SciPy / imate instead of dense +4. **Random linear algebra** for log-determinants at scale (install `imate`) + +## Basic Setup + +```python +from distributed import Client +from gpcam import GPOptimizer + +# Start a local Dask cluster +client = Client() # uses all available cores +client.wait_for_workers(4) # good practice: wait for workers before constructing + +gpo = GPOptimizer( + x_data=x_data, + y_data=y_data, + gp2Scale=True, + gp2Scale_dask_client=client, + gp2Scale_batch_size=500, # typical; tune up for large clusters + init_hyperparameters=np.array([0.73, 0.0014]), # signal var, length scale +) + +gpo.train(hyperparameter_bounds=hps_bounds, max_iter=25, info=True) +``` + +## Kernel Requirement + +When `gp2Scale=True`, the kernel MUST produce a sparse matrix. The default switches to an anisotropic Wendland kernel automatically if no custom kernel is provided. + +If providing a custom kernel, it must have compact support: +```python +from gpcam.kernels import wendland_anisotropic + +def my_gp2scale_kernel(x1, x2, hps): + """Custom kernel with compact support for gp2Scale.""" + return wendland_anisotropic(x1, x2, hps) +``` + +## Hyperparameters for Wendland Kernel + +``` +hps[0] = signal variance +hps[1:D+1] = per-dimension length scales (also control support radius) +``` + +The length scales in the Wendland kernel also determine the support radius — points further apart than the length scale have zero covariance. + +## Linear Algebra Modes + +| Mode | Description | +|------|-------------| +| `"Chol"` | Sparse Cholesky — fastest for moderate sparsity | +| `"sparseLU"` | Sparse LU decomposition | +| `"sparseCG"` | Conjugate gradient (iterative) | +| `"sparseMINRES"` | MINRES (iterative) | + +## Custom Block-MCMC Training + +For expensive gp2Scale likelihoods, standard `method="mcmc"` training may be too slow. gpCAM exposes a block Metropolis-Hastings sampler you can drive directly against the GP's log-likelihood: + +```python +import numpy as np +from gpcam import gpMCMC, ProposalDistribution + +def in_bounds(v, bounds): + return not (any(v < bounds[:, 0]) or any(v > bounds[:, 1])) + +def prior_function(theta, args): + return 0.0 if in_bounds(theta, args["bounds"]) else -np.inf + +def log_likelihood(hps, args): + return gpo.log_likelihood(hyperparameters=hps) # exposed on GPOptimizer + +pd = ProposalDistribution([0, 1], init_prop_Sigma=np.identity(2) * 0.01) + +mcmc = gpMCMC(log_likelihood, prior_function, [pd], + args={"bounds": hps_bounds}) +result = mcmc.run_mcmc(x0=np.array([1.0, 0.01]), n_updates=200, info=True) + +gpo.set_hyperparameters(result["mean(x)"]) +``` + +`ProposalDistribution` takes the list of hyperparameter indices in that block and an initial proposal covariance. Stack multiple `ProposalDistribution`s for block-wise updates of high-dimensional hyperparameter vectors (deep kernels, etc.). + +## HPC Setup (SLURM example) + +```python +from dask_jobqueue import SLURMCluster +from distributed import Client + +cluster = SLURMCluster( + cores=32, + memory="64GB", + walltime="01:00:00", +) +cluster.scale(jobs=4) # 4 nodes × 32 cores +client = Client(cluster) +``` + +## Estimating Compute Time + +```python +from gpcam.gp_optimizer import gp2Scale_time_estimate +gp2Scale_time_estimate(n_workers=8, worker_speed=500, n_data=100000) +``` + +## Common Pitfalls + +1. **Using a non-sparse kernel**: gp2Scale won't speed up dense kernels. +2. **Batch size too small**: More overhead. Start with 10000. +3. **Forgetting Dask client**: A local client is created by default but explicit is better. +4. **Length scales too large**: Reduces sparsity, defeating the purpose. Keep support radius reasonable. diff --git a/skills/kernel-designer/SKILL.md b/skills/kernel-designer/SKILL.md new file mode 100644 index 0000000..eda0ed0 --- /dev/null +++ b/skills/kernel-designer/SKILL.md @@ -0,0 +1,260 @@ +# Skill: gpCAM Kernel Designer + +Design custom kernel (covariance) functions for gpCAM that encode domain knowledge about the experiment. + +## When to Use + +When a user needs a kernel that goes beyond the default ARD Matérn-3/2: +- Known periodicity in the data +- Symmetry constraints (mirror, rotational) +- Different smoothness in different dimensions +- Combining multiple correlation structures (sum/product kernels) +- Non-Euclidean input spaces (strings, graphs, categorical) +- Non-stationary behavior (varying length scales) + +## Kernel Function Contract + +Every gpCAM kernel must satisfy: + +```python +def my_kernel(x1, x2, hyperparameters): + """ + Parameters + ---------- + x1 : np.ndarray, shape (N1, D) + x2 : np.ndarray, shape (N2, D) + hyperparameters : np.ndarray, 1D + + Returns + ------- + K : np.ndarray, shape (N1, N2) + Must be symmetric positive semi-definite. + """ +``` + +- `x1` and `x2` are 2D arrays even for 1D inputs +- The output must be an `(N1, N2)` matrix +- The kernel must be symmetric: `k(x1, x2) = k(x2, x1).T` +- The kernel must be positive semi-definite +- Use vectorized numpy operations — avoid Python loops over data points + +## Building Blocks + +Import from `gpcam.kernels` (or define locally): + +```python +from gpcam.kernels import ( + matern_kernel_diff1, # Matérn ν=3/2: once differentiable + matern_kernel_diff2, # Matérn ν=5/2: twice differentiable + squared_exponential_kernel, # RBF/SE: infinitely smooth + wendland_kernel, # Compact support (for gp2Scale) + get_distance_matrix, # Euclidean distance + get_anisotropic_distance_matrix, # ARD distance +) +``` + +These base kernels operate on **distance matrices**, not raw points. The pattern is: +1. Compute a distance matrix from points +2. Apply a base kernel to the distance matrix + +## Kernel Recipes + +### Standard Anisotropic (ARD) Kernel +```python +def anisotropic_matern(x1, x2, hps): + """ + hps[0]: signal variance + hps[1:D+1]: per-dimension length scales + """ + d = get_anisotropic_distance_matrix(x1, x2, hps[1:]) + return hps[0] * matern_kernel_diff1(d, 1.0) +``` + +### Periodic Kernel +For data with known periodicity (e.g., angular measurements, crystal lattice): +```python +def periodic_kernel(x1, x2, hps): + """ + hps[0]: signal variance + hps[1]: length scale + hps[2]: period + """ + d = np.abs(np.subtract.outer(x1[:, 0], x2[:, 0])) + return hps[0] * np.exp(-2.0 * np.sin(np.pi * d / hps[2])**2 / hps[1]**2) +``` + +### Periodic + Smooth (Product Kernel) +Periodic in one dimension, smooth Matérn in others: +```python +def periodic_plus_smooth(x1, x2, hps): + """ + hps[0]: signal variance + hps[1]: periodic length scale + hps[2]: period + hps[3:]: Matérn length scales for remaining dims + """ + # Periodic in dim 0 + d_periodic = np.abs(np.subtract.outer(x1[:, 0], x2[:, 0])) + k_periodic = np.exp(-2.0 * np.sin(np.pi * d_periodic / hps[2])**2 / hps[1]**2) + + # Matérn in remaining dims + d_other = get_anisotropic_distance_matrix(x1[:, 1:], x2[:, 1:], hps[3:]) + k_other = matern_kernel_diff1(d_other, 1.0) + + return hps[0] * k_periodic * k_other +``` + +### Sum Kernel (Multiple Scales) +Captures both coarse and fine structure: +```python +def multi_scale_kernel(x1, x2, hps): + """ + hps[0]: variance of coarse component + hps[1]: length scale of coarse component + hps[2]: variance of fine component + hps[3]: length scale of fine component + """ + d = get_distance_matrix(x1, x2) + k_coarse = hps[0] * matern_kernel_diff2(d, hps[1]) # smooth, long-range + k_fine = hps[2] * matern_kernel_diff1(d, hps[3]) # rougher, short-range + return k_coarse + k_fine +``` + +### Symmetry-Enforcing Kernel +For data known to be symmetric about an axis: +```python +def symmetric_kernel_x(x1, x2, hps): + """Mirror symmetry about x=0 in the first dimension.""" + x1_flip = x1.copy() + x1_flip[:, 0] = -x1_flip[:, 0] + x2_flip = x2.copy() + x2_flip[:, 0] = -x2_flip[:, 0] + + d = get_anisotropic_distance_matrix + k = lambda a, b: hps[0] * matern_kernel_diff1(d(a, b, hps[1:]), 1.0) + + return 0.25 * (k(x1, x2) + k(x1_flip, x2) + k(x1, x2_flip) + k(x1_flip, x2_flip)) +``` + +### L1 (Manhattan) Distance Kernel +Separable per dimension — each dimension contributes independently: +```python +def l1_kernel(x1, x2, hps): + """Product of per-dimension exponential kernels (L1 distance).""" + k = hps[0] * np.ones((len(x1), len(x2))) + for i in range(x1.shape[1]): + d_i = np.abs(np.subtract.outer(x1[:, i], x2[:, i])) + k *= np.exp(-d_i / hps[1 + i]) + return k +``` + +### Non-Euclidean Input Spaces (strings, graphs, categorical) + +gpCAM accepts arbitrary Python objects as inputs — `x_data` can be a list of strings, graphs, molecules, etc. The only requirement is that your kernel computes a valid covariance between any two objects. + +```python +from gpcam import GPOptimizer +from gpcam.kernels import matern_kernel_diff1 + +def string_distance(s1, s2): + diff = abs(len(s1) - len(s2)) + common = min(len(s1), len(s2)) + return diff + sum(a != b for a, b in zip(s1[:common], s2[:common])) + +def string_kernel(x1, x2, hps): + """x1, x2 are lists/sequences of strings; hps = [signal_var, length_scale].""" + d = np.array([[string_distance(a, b) for b in x2] for a in x1]) + return hps[0] * matern_kernel_diff1(d, hps[1]) + +x_data = ["hello", "world", "this", "is", "gpcam"] +y_data = np.array([2.0, 1.9, 1.8, 3.0, 5.0]) + +gp = GPOptimizer(x_data, y_data, + init_hyperparameters=np.ones(2), + kernel_function=string_kernel) +gp.train(hyperparameter_bounds=np.array([[1e-3, 100.], [1e-3, 100.]])) + +# Predict on new objects: +gp.posterior_mean(["full"])["m(x)"] + +# Ask which of a candidate set to measure next: +gp.ask(["who", "could", "it", "be"], n=4) +``` + +Notes: +- A Python `for` loop over objects is fine here (you're bottlenecked by the distance function, not numpy). +- If the distance function is symmetric, the resulting kernel is automatically symmetric; you still need the base kernel (Matérn/SE) to make it PSD. +- For multi-task on non-Euclidean inputs: `fvGPOptimizer(x_strings, y_multi, kernel_function=...)` works the same way. + +### Deep Kernel (NN-warped input space) + +For hard multi-task structure or learned metrics, warp the input through a small neural net and then apply a stationary kernel in the warped space. `gpcam.deep_kernel_network.Network` gives you an MLP whose weights you read out of `hps`: + +```python +from gpcam.deep_kernel_network import Network +from gpcam.kernels import get_distance_matrix, matern_kernel_diff1 + +iset_dim = 3 +layer_width = 5 +n = Network(iset_dim, layer_width) +# n.number_of_hps tells you how many NN hyperparameters to reserve. + +def deep_kernel(x1, x2, hps): + signal_var, length_scale = hps[0], hps[1] + # unpack the remaining hps into n.set_weights / n.set_biases (layout: see manual) + x1_nn = n.forward(x1) + x2_nn = n.forward(x2) + d = get_distance_matrix(x1_nn, x2_nn) + return signal_var * matern_kernel_diff1(d, length_scale) +``` + +Hyperparameter layout: `hps[0]=signal_var`, `hps[1]=length_scale`, `hps[2:]` = flattened NN weights+biases in the order `Network` expects. Use `method="mcmc"` or `method="adam"` for training; `global`/`local` don't scale well to NN-sized hyperparameter vectors. + +## Smoothness Guide + +| Kernel | Smoothness | When to Use | +|--------|-----------|-------------| +| Matérn-1/2 (exponential) | Rough, continuous but not differentiable | Sharp peaks, discontinuities | +| Matérn-3/2 | Once differentiable | **Default choice** — most physical data | +| Matérn-5/2 | Twice differentiable | Smoother physical signals | +| Squared Exponential (RBF) | Infinitely smooth | Very smooth data; tends to oversmooth | + +## Hyperparameter Coordination + +**Critical:** When designing a custom kernel, you must: + +1. **Document the hyperparameter layout** — which index maps to what +2. **Set matching bounds** — the `hyperparameter_bounds` array must have one row per hyperparameter +3. **Coordinate with noise/mean functions** — all three share the same hyperparameter vector + +```python +# ALWAYS include a comment block like this: +# +# Hyperparameter layout for my_kernel: +# hps[0] = signal variance bounds: [0.001, 100] +# hps[1] = length scale dim 0 bounds: [0.01, range_dim0 * 10] +# hps[2] = length scale dim 1 bounds: [0.01, range_dim1 * 10] +# hps[3] = period (if periodic) bounds: [expected_period * 0.5, expected_period * 2] +``` + +### Setting Initial Hyperparameters +- **Signal variance**: start near `np.var(y_data)` or `1.0` +- **Length scales**: start near `0.1 * range(x_dim)` — not too small (overfitting) or too large (underfitting) +- **Period**: start near the expected period if known + +### Setting Bounds +- **Signal variance**: `[0.001, 10 * np.std(y_data)]` +- **Length scales**: `[0.01, 10 * range(x_dim)]` — lower bound prevents overfitting +- **Period**: `[0.5 * expected, 2 * expected]` — tight if well-known + +## Common Pitfalls + +1. **Non-PSD kernel**: Sums and products of valid kernels are valid. Differences are NOT guaranteed PSD. +2. **Python loops over data**: Use `np.subtract.outer` and vectorized ops. A double for-loop over N points is O(N²) in Python — unusable for >100 points. +3. **Forgetting signal variance**: Always include a leading amplitude hyperparameter (`hps[0] * ...`). +4. **Length scale = 0**: Causes division by zero. Set lower bounds > 0 (e.g., 0.001). +5. **Mismatched hyperparameter count**: The bounds array rows must equal the total hyperparameter vector length. + +## Reference + +See `gpcam/kernels.py` (re-exports the fvGP kernel library) for the full library of kernel building blocks and the [gpCAM documentation](https://gpcam.readthedocs.io) for mathematical details. diff --git a/skills/multi-task-advanced/SKILL.md b/skills/multi-task-advanced/SKILL.md new file mode 100644 index 0000000..54a9520 --- /dev/null +++ b/skills/multi-task-advanced/SKILL.md @@ -0,0 +1,124 @@ +# Skill: Multi-Task GPs with fvGPOptimizer + +Design experiments with vector-valued or function-valued outputs using gpCAM's multi-task GP. + +## When to Use + +- Measuring a spectrum (many output channels per input point) +- Multiple correlated outputs (e.g., intensity at different energies) +- Exploiting correlations between tasks to improve predictions + +## Key Concept + +In fvGP, a multi-task GP is a single-task GP over the **Cartesian product** of input space × output space. The output dimension is appended as an extra column to the input. + +For example, with 2D input and 3 output channels, a point looks like: +``` +[x0, x1, task_id] where task_id ∈ {0, 1, 2} +``` + +This means your kernel must handle D+1 dimensional inputs, where the last dimension is the task index. + +## Basic Setup + +```python +import numpy as np +from gpcam import fvGPOptimizer + +# 100 input points, 5 output channels +x_data = np.random.uniform(0, 1, (100, 2)) # 2D input +y_data = np.random.randn(100, 5) # 5 outputs per point + +# Default path — uses a built-in deep kernel, no hyperparameter bounds required: +gpo = fvGPOptimizer(x_data, y_data) +gpo.train(max_iter=20) + +# Custom kernel path — supply init_hyperparameters and hp bounds as with GPOptimizer: +gpo = fvGPOptimizer( + x_data=x_data, + y_data=y_data, + init_hyperparameters=np.ones(4) / 10.0, + kernel_function=my_multi_task_kernel, +) +gpo.train(hyperparameter_bounds=np.array([ + [0.01, 10.0], # signal variance + [0.01, 10.0], # length scale dim 0 + [0.01, 10.0], # length scale dim 1 + [0.01, 10.0], # length scale for task dimension +])) +``` + +### Predictions and ask — the `x_out` argument + +Multi-task prediction methods take `x_out`, an array of task indices you want predictions for: + +```python +# Predict all 5 task outputs at a grid of input points: +mean = gpo.posterior_mean(x_grid, x_out=np.array([0, 1, 2, 3, 4]))["m(x)"] # shape (N, 5) +std = np.sqrt(gpo.posterior_covariance(x_grid, x_out=np.array([0, 1, 2, 3, 4]))["v(x)"]) + +# Ask for the next best input point across all tasks: +gpo.ask(parameter_bounds, x_out=np.array([0, 1, 2, 3, 4]), n=1) + +# Ask for a batch of 4 points using a batch-aware acquisition: +gpo.ask(parameter_bounds, x_out=np.array([0, 1]), n=4, + acquisition_function="relative information entropy set", vectorized=True) +``` + +### One-shot optimize + +For simple black-box vector-valued optimization, `optimize()` replaces the manual loop: +```python +def f(x): # x shape (N, D) + y = np.column_stack([...]) # shape (N, T) + noise = np.full(y.shape, 0.01) + return y, noise + +result = fvGPOptimizer().optimize( + func=f, + x_out=np.array([0, 1]), # which task indices to treat as outputs + search_space=np.array([[0., 1.]]), + max_iter=50, +) +``` + +## Multi-Task Kernel Design + +The kernel receives inputs with the task index as the last column. You need to model both within-task and between-task correlations: + +```python +from gpcam.kernels import matern_kernel_diff1, get_distance_matrix + +def multi_task_kernel(x1, x2, hps): + """ + x1, x2: shape (N, D+1) where last column is task index + hps[0]: signal variance + hps[1:D+1]: input space length scales + hps[D+1]: task correlation strength + """ + # Spatial kernel (input dimensions only) + d_spatial = get_distance_matrix(x1[:, :-1], x2[:, :-1]) + k_spatial = matern_kernel_diff1(d_spatial, hps[1]) + + # Task kernel (last dimension) + task1 = x1[:, -1] + task2 = x2[:, -1] + # Simple: same task = 1, different task = correlation + same_task = np.equal.outer(task1, task2).astype(float) + k_task = same_task + hps[2] * (1 - same_task) + + return hps[0] * k_spatial * k_task +``` + +## Important Notes + +1. **Multi-task acquisition**: Use `"relative information entropy set"` / `"relative information entropy"` / `"variance"` / `"total correlation"` for batch acquisition across tasks. Pass `x_out=...` to the `ask()` call. Custom callables are supported and often advisable when you care about a specific task or combination. +2. **Missing task observations**: `y_data` can have `np.nan` entries (e.g., task 1 wasn't measured at some x); the corresponding `noise_variances` entry **must also** be `np.nan`. The GP just ignores those entries — no imputation needed. +3. **Default kernel**: `fvGPOptimizer(x, y)` with no kernel uses a built-in deep kernel that learns its own hyperparameters and doesn't require bounds. If you supply a custom `kernel_function`, you become responsible for the full `init_hyperparameters` + `hyperparameter_bounds` layout. +4. **Deep kernel via NN warping**: For harder multi-task structure, `from gpcam.deep_kernel_network import Network` gives you an MLP you parametrize from `hps` — see the `kernel-designer` skill for the pattern. + +## Common Pitfalls + +1. **Forgetting the task dimension**: The kernel sees D+1 columns, not D. +2. **Noise shape**: `noise_variances` is shape `(N, No)` for multi-task, not `(N,)`. +3. **Scaling**: N data points × No outputs = N×No rows in the internal GP. Gets large fast. diff --git a/skills/noise-functions/SKILL.md b/skills/noise-functions/SKILL.md new file mode 100644 index 0000000..32115bc --- /dev/null +++ b/skills/noise-functions/SKILL.md @@ -0,0 +1,145 @@ +# Skill: gpCAM Noise Functions + +Design custom noise models for experiments with non-uniform or structured noise. + +## When to Use + +- Noise varies across the parameter space (e.g., edges of detector have more noise) +- Noise depends on signal intensity (Poisson/shot noise) +- You want the noise level to be learned as a hyperparameter +- Correlated noise between measurements + +## Noise Function Contract + +```python +def my_noise(x, hyperparameters): + """ + Parameters + ---------- + x : np.ndarray, shape (N, D) + Input positions. + hyperparameters : np.ndarray, 1D + The FULL hyperparameter vector (shared with kernel and mean). + + Returns + ------- + noise : np.ndarray + Either shape (N,) for diagonal noise (independent per point), + or shape (N, N) for full noise covariance matrix. + """ +``` + +## When to Use What + +| Scenario | Approach | +|----------|----------| +| Known, uniform noise | Use `noise_variances=np.full(N, sigma**2)` — no noise function needed | +| Known per-point noise | Use `noise_variances=my_array` — no noise function needed | +| Unknown uniform noise | Use a noise function with a learnable hyperparameter | +| Position-dependent noise | Use a noise function that depends on `x` | +| No noise info at all | Don't provide either — gpCAM defaults to `(0.01 * mean|y|)²` | + +## Recipes + +### Learnable Constant Noise +The noise level is a hyperparameter that gets optimized: +```python +def learnable_noise(x, hps): + """hps[K] = noise standard deviation (learned).""" + K = 3 # INDEX WHERE NOISE HP STARTS — adjust for your kernel + return np.full(len(x), hps[K]**2) # return VARIANCE, not std +``` + +### Position-Dependent Noise +More noise at edges of the measurement range: +```python +def edge_noise(x, hps): + """Higher noise near boundaries.""" + K = 3 + base_noise = hps[K]**2 + # Increase noise near edges (within 10% of range) + center = np.mean(parameter_bounds, axis=1) + half_range = (parameter_bounds[:, 1] - parameter_bounds[:, 0]) / 2 + dist_from_center = np.abs(x - center) / half_range # 0 at center, 1 at edge + edge_factor = 1.0 + 5.0 * np.max(dist_from_center, axis=1)**2 + return base_noise * edge_factor +``` + +### Poisson-Like (Signal-Dependent) Noise +Common in photon-counting detectors: +```python +def poisson_noise(x, hps): + """ + Noise proportional to sqrt of expected signal. + Uses the GP posterior mean as estimate of signal. + + NOTE: This creates a feedback loop — use carefully. + hps[K] = noise scale factor + """ + K = 3 + # Can't call posterior_mean here directly (circular dependency) + # Instead, use a fixed estimate or pass through args + # Simple version: just scale with position + return np.full(len(x), hps[K]**2) +``` + +### Two-Level Noise (Different Detectors/Modes) +```python +def two_detector_noise(x, hps): + """ + Different noise for two measurement modes. + Assumes last dimension of x encodes the mode (0 or 1). + """ + K = 3 + noise = np.empty(len(x)) + mode_0 = x[:, -1] < 0.5 + mode_1 = ~mode_0 + noise[mode_0] = hps[K]**2 # detector 1 + noise[mode_1] = hps[K+1]**2 # detector 2 + return noise +``` + +## Hyperparameter Coordination + +Noise function hyperparameters come from the **same vector** as kernel and mean hyperparameters. + +```python +# Example layout: +# hps[0] = signal variance (kernel) +# hps[1:3] = length scales (kernel) +# hps[3] = noise std dev (noise function) ← YOUR NOISE HP +# +# Total: 4 hyperparameters + +hp_bounds = np.array([ + [0.001, 100.0], # signal variance + [0.01, 50.0], # length scale dim 0 + [0.01, 50.0], # length scale dim 1 + [0.001, 10.0], # noise std dev +]) +``` + +### Choosing Noise Bounds +- **Lower bound**: Never 0 — use `0.001` minimum (prevents singular matrices) +- **Upper bound**: `10 * std(y_data)` — noise shouldn't be larger than the signal +- **Initial value**: `0.01 * mean(|y_data|)` (gpCAM's own default) + +## Important: noise_variances vs noise_function + +**Do not provide both.** Use one or the other: + +```python +# Option A: fixed known noise +gpo = GPOptimizer(x_data, y_data, noise_variances=np.full(N, 0.01)) + +# Option B: learnable noise function +gpo = GPOptimizer(x_data, y_data, noise_function=learnable_noise) +``` + +## Common Pitfalls + +1. **Returning std instead of variance**: The noise function must return **variance** (σ²), not standard deviation (σ). +2. **Zero noise**: Causes singular matrix errors. Always ensure noise > 0. +3. **Providing both**: Don't pass `noise_variances` AND `noise_function` — pick one. +4. **Forgetting bounds**: The noise hyperparameter needs bounds in the `hyperparameter_bounds` array. +5. **Wrong shape**: Return shape `(N,)` for independent noise, `(N, N)` for correlated. diff --git a/skills/prior-mean-functions/SKILL.md b/skills/prior-mean-functions/SKILL.md new file mode 100644 index 0000000..cb6bab7 --- /dev/null +++ b/skills/prior-mean-functions/SKILL.md @@ -0,0 +1,148 @@ +# Skill: gpCAM Prior Mean Functions + +Design prior mean functions that encode known physics or expected trends. + +## When to Use + +When the user has prior knowledge about the expected behavior: +- A known baseline or background (linear trend, polynomial) +- Physical model (Gaussian peak, Lorentzian, Bragg's law) +- Expected shape from previous experiments +- The signal is known to be non-zero on average + +## Prior Mean Function Contract + +```python +def my_mean(x, hyperparameters): + """ + Parameters + ---------- + x : np.ndarray, shape (N, D) + Input positions. + hyperparameters : np.ndarray, 1D + The FULL hyperparameter vector (shared with kernel and noise). + + Returns + ------- + m : np.ndarray, shape (N,) + Prior mean value at each input point. + """ +``` + +**If no prior mean is provided**, gpCAM uses the average of `y_data` as a constant mean — this is fine for most cases. + +## When NOT to Use a Prior Mean + +- You don't have a strong physical expectation → use default (mean of data) +- Your model might be wrong → a bad prior mean biases the GP and can hurt more than help +- You're purely exploring → the GP will learn the mean from data + +## Recipes + +### Constant Mean (explicit) +```python +def constant_mean(x, hps): + """Explicit constant mean. hps[-1] = mean value.""" + return np.full(len(x), hps[-1]) +``` + +### Linear Trend +```python +def linear_mean(x, hps): + """ + Linear prior mean: m(x) = a + b0*x0 + b1*x1 + ... + Uses hps[K], hps[K+1], ..., hps[K+D] where K is where mean hps start. + """ + K = 3 # INDEX WHERE MEAN HYPERPARAMETERS START — adjust for your kernel + D = x.shape[1] + intercept = hps[K] + slopes = hps[K+1:K+1+D] + return intercept + x @ slopes +``` + +### Gaussian Peak (known approximate location) +```python +def gaussian_peak_mean(x, hps): + """ + Prior: expect a Gaussian peak near a known location. + hps[K]: amplitude + hps[K+1]: center_x + hps[K+2]: center_y + hps[K+3]: width + """ + K = 3 # adjust + amp = hps[K] + cx, cy = hps[K+1], hps[K+2] + w = hps[K+3] + r2 = (x[:, 0] - cx)**2 + (x[:, 1] - cy)**2 + return amp * np.exp(-r2 / (2 * w**2)) +``` + +### Polynomial Background +```python +def quadratic_mean(x, hps): + """Quadratic background: a + b*x + c*x^2 (1D).""" + K = 2 # adjust + return hps[K] + hps[K+1] * x[:, 0] + hps[K+2] * x[:, 0]**2 +``` + +### Physics-Informed (Bragg's Law Example) +```python +def bragg_mean(x, hps): + """ + Prior mean based on Bragg's law: peak at 2*d*sin(theta) = n*lambda. + x[:, 0] = 2theta angle in degrees + hps[K] = amplitude + hps[K+1] = d-spacing estimate + """ + K = 3 + wavelength = 1.54 # Cu K-alpha, fixed + two_theta = np.radians(x[:, 0]) + # Expected peak positions + d = hps[K+1] + expected = hps[K] * np.exp(-(np.sin(two_theta/2) - wavelength/(2*d))**2 / 0.001) + return expected +``` + +## Hyperparameter Coordination + +**This is where most bugs happen.** The prior mean function receives the same hyperparameter vector as the kernel and noise functions. You must: + +1. Decide which indices the mean function uses +2. Add bounds for those hyperparameters +3. Document the full layout + +```python +# Example layout: +# hps[0] = signal variance (kernel) +# hps[1:3] = length scales (kernel, 2D input) +# hps[3] = intercept (mean function) +# hps[4:6] = slopes (mean function) +# hps[6] = noise amplitude (noise function) +# +# Total: 7 hyperparameters + +hp_bounds = np.array([ + [0.001, 100.0], # signal variance + [0.01, 50.0], # length scale dim 0 + [0.01, 50.0], # length scale dim 1 + [-10.0, 10.0], # intercept + [-5.0, 5.0], # slope dim 0 + [-5.0, 5.0], # slope dim 1 + [0.001, 10.0], # noise amplitude +]) +``` + +## Setting Mean Function Hyperparameter Bounds + +- **Intercept/amplitude**: `[min(y_data) * 2, max(y_data) * 2]` +- **Slopes**: `[-range(y)/range(x), +range(y)/range(x)]` +- **Peak position**: `[known_position - tolerance, known_position + tolerance]` +- **Width**: `[min_expected_width, max_expected_width]` + +## Common Pitfalls + +1. **Wrong hyperparameter indices**: Double-check which indices the mean function reads — they must not overlap with kernel/noise indices. +2. **Overconfident prior**: If the mean function is too specific and wrong, the GP will fight between data and prior. Keep it loose. +3. **Forgetting to add bounds**: Every hyperparameter used by the mean function needs a row in `hyperparameter_bounds`. +4. **Return shape**: Must return 1D array of length `len(x)`, not a scalar.