diff --git a/CLAUDE.md b/CLAUDE.md
new file mode 100644
index 0000000..f7725dc
--- /dev/null
+++ b/CLAUDE.md
@@ -0,0 +1,32 @@
+# gpCAM Claude Skills
+
+You are helping scientists design autonomous experiments using gpCAM, a Gaussian Process-based Bayesian optimization toolkit.
+
+## Skills
+
+Read the appropriate skill file before generating code:
+
+- **Designing an experiment**: `skills/experiment-designer/SKILL.md`
+- **Custom kernels**: `skills/kernel-designer/SKILL.md`
+- **Acquisition functions**: `skills/acquisition-functions/SKILL.md`
+- **Prior mean functions**: `skills/prior-mean-functions/SKILL.md`
+- **Noise models**: `skills/noise-functions/SKILL.md`
+- **Cost functions (travel time)**: `skills/cost-functions/SKILL.md`
+- **Large-scale (>10k points)**: `skills/gp2scale-advanced/SKILL.md`
+- **Multi-task/multi-output**: `skills/multi-task-advanced/SKILL.md`
+
+## Reference Materials
+
+- [gpCAM documentation](https://gpcam.readthedocs.io) — full API reference and mathematical background
+- `gpcam/kernels.py` — re-exports the fvGP kernel library (`from fvgp.kernels import *`)
+
+## Key Principles
+
+1. **Generate complete, runnable scripts** — not fragments
+2. **Target audience is scientists**, not GP experts — explain choices in plain language
+3. **Always document the hyperparameter layout** — which index maps to what
+4. **Hyperparameter bounds must match** the total hyperparameter count across kernel + mean + noise
+5. **Default kernel is usually fine** — only suggest custom when there's a clear reason
+6. **Use vectorized numpy** — no Python loops over data points
+7. **Return dict keys**: `posterior_mean()` returns `"m(x)"`, `posterior_covariance()` returns `"v(x)"` and `"S"`. NOT `"f(x)"`.
+8. **`get_data()` keys use spaces**: `"x data"`, `"y data"`, `"hyperparameters"`, `"measurement variances"`. NOT underscores.
diff --git a/README.md b/README.md
index ecb335c..efa95a5 100644
--- a/README.md
+++ b/README.md
@@ -38,6 +38,26 @@ for i in range(100):
 ```
 
 
+## Designing experiments with Claude Code
+
+gpCAM ships with a set of [Claude Code](https://docs.anthropic.com/en/docs/claude-code) skills that guide an AI assistant through designing autonomous experiments — custom kernels, acquisition functions, noise models, and the full ask/tell/train loop. Experimentalists who want smart, autonomous data acquisition without deep knowledge of GP math or the gpCAM API can use these skills to design autonomous experiments.
+
+When you clone this repo, the root `CLAUDE.md` and `skills/` directory are picked up automatically by Claude Code. Available skills:
+
+| Skill | Description |
+|-------|-------------|
+| **experiment-designer** | End-to-end autonomous experiment design. Translates a scientist's description of their measurement into a complete gpCAM script. |
+| **kernel-designer** | Design and compose custom kernel functions that encode domain knowledge (smoothness, periodicity, symmetry, anisotropy). |
+| **acquisition-functions** | Write custom acquisition functions that encode experimental priorities (exploration vs exploitation, multi-objective, constraints). |
+| **prior-mean-functions** | Encode known physics or expected trends as prior mean functions. |
+| **noise-functions** | Model position-dependent or heteroscedastic noise from detector characteristics. |
+| **cost-functions** | Account for motor travel time, settling, directional costs, and zone-based penalties. |
+| **gp2scale-advanced** | Large-scale experiments (>10k points) using sparse kernels and Dask distributed computing. |
+| **multi-task-advanced** | Multi-output / function-valued experiments with `fvGPOptimizer`. |
+
+These skills are also compatible with other agentic platforms (e.g. [OpenClaw](https://openclaw.ai), or any harness that can read `SKILL.md` files) — point your assistant at the `skills/` directory.
+
+
 ## Credits
 
 Main Developer: Marcus Noack ([MarcusNoack@lbl.gov](mailto:MarcusNoack@lbl.gov))
diff --git a/skills/acquisition-functions/SKILL.md b/skills/acquisition-functions/SKILL.md
new file mode 100644
index 0000000..af2d031
--- /dev/null
+++ b/skills/acquisition-functions/SKILL.md
@@ -0,0 +1,205 @@
+# Skill: gpCAM Acquisition Functions
+
+Design custom acquisition functions that control where gpCAM measures next.
+
+## When to Use
+
+When a user needs acquisition behavior beyond the built-in options:
+- Balancing exploration and exploitation
+- Multi-objective optimization
+- Constrained search (avoid forbidden regions)
+- Cost-aware acquisition (expensive moves)
+- Upper/lower confidence bounds
+- Probability of improvement
+
+## Acquisition Function Contract
+
+```python
+def my_acquisition(x, gp_optimizer):
+    """
+    Parameters
+    ----------
+    x : np.ndarray, shape (V, D)
+        Candidate points to evaluate.
+    gp_optimizer : GPOptimizer
+        The GP model. Use its posterior_mean() and posterior_covariance() methods.
+    
+    Returns
+    -------
+    scores : np.ndarray, shape (V,)
+        Score for each candidate. HIGHER = more desirable (function is MAXIMIZED).
+    """
+```
+
+**Key rule:** The acquisition function is **maximized**. Return higher values for points you want to measure.
+
+## Built-in Options
+
+Pass these as strings to `gpo.ask(acquisition_function=...)`:
+
+| Name | String key | Best for |
+|------|-----------|----------|
+| Variance | `"variance"` | Pure exploration / mapping |
+| Expected Improvement | `"expected improvement"` | Optimization (find max) |
+| Probability of Improvement | `"probability of improvement"` | Optimization, risk-averse |
+| Upper Confidence Bound | `"ucb"` | Maximization with tunable exploration/exploitation |
+| Lower Confidence Bound | `"lcb"` | Minimization with tunable exploration/exploitation |
+| Predicted Maximum | `"maximum"` | Pure exploitation — mean only, no uncertainty |
+| Predicted Minimum | `"minimum"` | Pure exploitation for minimization |
+| Gradient | `"gradient"` | Seek steepest regions of the posterior mean |
+| Target Probability | `"target probability"` | Find points with output near a target value |
+| Relative Information Entropy | `"relative information entropy"` | Information-theoretic exploration |
+| RIE Set | `"relative information entropy set"` | Batch acquisition |
+| Total Correlation | `"total correlation"` | Batch acquisition |
+
+To sanity-check any built-in or custom acquisition on a grid of candidates without calling `ask()`:
+```python
+scores = gpo.evaluate_acquisition_function(x_grid, acquisition_function="ucb")
+```
+
+## Custom Acquisition Recipes
+
+### Upper Confidence Bound (UCB)
+Available as the built-in string `"ucb"` — pass directly to `gpo.ask(acquisition_function="ucb")`. Write the callable form below only when you need to tune `beta` or otherwise customize the score:
+```python
+def ucb(x, gpo):
+    """
+    beta controls exploration/exploitation tradeoff:
+      beta=0: pure exploitation (just go to predicted max)
+      beta=1: mild exploration
+      beta=3: strong exploration (~95% confidence)
+    """
+    beta = 2.0  # TUNE THIS
+    mean = gpo.posterior_mean(x)["m(x)"]
+    var = gpo.posterior_covariance(x, variance_only=True)["v(x)"]
+    return mean + beta * np.sqrt(var)
+```
+
+### Lower Confidence Bound (for minimization)
+gpCAM maximizes acquisition, so flip the sign for minimization:
+```python
+def lcb(x, gpo):
+    """Find the minimum of the function."""
+    beta = 2.0
+    mean = gpo.posterior_mean(x)["m(x)"]
+    var = gpo.posterior_covariance(x, variance_only=True)["v(x)"]
+    return -(mean - beta * np.sqrt(var))  # note the negation
+```
+
+### Expected Improvement (custom version with minimization)
+```python
+from scipy.stats import norm
+
+def expected_improvement_minimize(x, gpo):
+    """Expected improvement for finding the MINIMUM."""
+    mean = gpo.posterior_mean(x)["m(x)"]
+    var = gpo.posterior_covariance(x, variance_only=True)["v(x)"]
+    std = np.sqrt(np.maximum(var, 1e-10))
+    
+    y_best = np.min(gpo.y_data)  # current best (minimum)
+    z = (y_best - mean) / std
+    ei = std * (z * norm.cdf(z) + norm.pdf(z))
+    return ei
+```
+
+### Probability of Improvement
+```python
+from scipy.stats import norm
+
+def probability_of_improvement(x, gpo):
+    """Probability that measurement improves on current best."""
+    mean = gpo.posterior_mean(x)["m(x)"]
+    var = gpo.posterior_covariance(x, variance_only=True)["v(x)"]
+    std = np.sqrt(np.maximum(var, 1e-10))
+    
+    y_best = np.max(gpo.y_data)
+    z = (mean - y_best) / std
+    return norm.cdf(z)
+```
+
+### Constrained Acquisition (Avoid Regions)
+```python
+def constrained_variance(x, gpo):
+    """Explore but avoid a circular forbidden zone."""
+    var = gpo.posterior_covariance(x, variance_only=True)["v(x)"]
+    
+    # Forbidden zone: circle at (5, 5) with radius 1
+    center = np.array([5.0, 5.0])
+    dist = np.linalg.norm(x - center, axis=1)
+    penalty = np.where(dist < 1.0, -1e6, 0.0)
+    
+    return var + penalty
+```
+
+### Multi-Objective (Weighted)
+```python
+def multi_objective(x, gpo):
+    """Balance finding the max with reducing uncertainty."""
+    mean = gpo.posterior_mean(x)["m(x)"]
+    var = gpo.posterior_covariance(x, variance_only=True)["v(x)"]
+    
+    w_exploit = 0.7  # weight on exploitation
+    w_explore = 0.3  # weight on exploration
+    
+    # Normalize each component to [0, 1]
+    mean_norm = (mean - mean.min()) / (mean.max() - mean.min() + 1e-10)
+    var_norm = (var - var.min()) / (var.max() - var.min() + 1e-10)
+    
+    return w_exploit * mean_norm + w_explore * var_norm
+```
+
+### Threshold Finder (Find Boundary)
+Useful when searching for where a signal crosses a threshold:
+```python
+def threshold_finder(x, gpo):
+    """Find the boundary where f(x) = threshold."""
+    threshold = 0.5  # EDIT THIS
+    
+    mean = gpo.posterior_mean(x)["m(x)"]
+    var = gpo.posterior_covariance(x, variance_only=True)["v(x)"]
+    std = np.sqrt(np.maximum(var, 1e-10))
+    
+    # Score is high near the threshold AND where uncertainty is high
+    distance_to_threshold = np.abs(mean - threshold)
+    return std / (distance_to_threshold + 0.01)
+```
+
+## Usage in the Experiment Loop
+
+```python
+# Built-in string or a callable are both accepted:
+result = gpo.ask(
+    input_set=parameter_bounds,
+    acquisition_function=ucb,   # or "ucb", "expected improvement", ...
+)
+```
+
+### Useful `ask()` options
+
+| Argument | Meaning |
+|----------|---------|
+| `n=N` | Request `N` points at once (batch). For vectorized single-task, use a batch-aware acquisition like `"relative information entropy set"` or `"total correlation"`. |
+| `vectorized=True` (default) | The acquisition function is called once with all candidate points, shape `(V, D)` — required for custom callables written against the contract above. |
+| `vectorized=False` | Candidates are evaluated one at a time (list of 1-D arrays). Used for non-vectorizable acquisition or non-Euclidean inputs. |
+| `method="global"\|"local"\|"hgdl"\|"hgdlAsync"` | Inner optimizer that searches for the argmax of the acquisition over `input_set`. `hgdl` requires `dask_client=`; `hgdlAsync` starts a background search and returns an `opt_obj` you can poll or `kill_client()`. |
+| `dask_client=client` | Distribute the inner optimization across Dask workers. |
+| `batch_size=B` | When candidates are a list, evaluate them in chunks of `B` on the cluster. |
+| `max_iter`, `pop_size`, `info=True` | Inner optimizer controls. |
+
+`input_set` can be continuous bounds (`np.array([[lo,hi], ...])`), a list of candidate points (discrete finite set), or a list of arbitrary objects (non-Euclidean — strings, graphs — provided your kernel handles them).
+
+## Hyperparameter Coordination
+
+Acquisition functions don't add hyperparameters — they read the GP state via `gpo.posterior_mean()` and `gpo.posterior_covariance()`. However:
+
+- If you access `gpo.y_data` directly (e.g., for `y_best`), make sure it's up to date after `tell()`
+- The GP must be trained before acquisition makes sense — always call `train()` first
+- For `variance_only=True`: faster, returns just diagonal variances (usually what you want)
+- For full covariance: use `variance_only=False` but this is O(V²) memory
+
+## Common Pitfalls
+
+1. **Returning negative scores for points you want**: Remember, acquisition is MAXIMIZED.
+2. **Division by zero in std**: Always use `np.maximum(var, 1e-10)` before taking sqrt.
+3. **Not handling edge cases**: Early in the loop with few points, the GP posterior can be unreliable.
+4. **Expensive acquisition functions**: They're evaluated many times during optimization. Keep them fast.
diff --git a/skills/cost-functions/SKILL.md b/skills/cost-functions/SKILL.md
new file mode 100644
index 0000000..c40bc0d
--- /dev/null
+++ b/skills/cost-functions/SKILL.md
@@ -0,0 +1,195 @@
+# Skill: gpCAM Cost Functions
+
+Design cost functions that account for the real expense of moving between measurement points — motor travel time, sample damage, beam time, etc.
+
+## When to Use
+
+- Moving motors is slow and travel time matters (long-range stage moves)
+- Some regions of the parameter space are more expensive to reach
+- You want to avoid unnecessary large jumps between measurements
+- Cost varies by direction (e.g., moving up is faster than moving down)
+- Sample damage accumulates with exposure and should be minimized
+
+## How Cost Functions Work in gpCAM
+
+The cost function modifies the acquisition score:
+
+```
+effective_score(x) = acquisition_score(x) / cost(origin, x)
+```
+
+Points that are expensive to reach get penalized. The optimizer still picks high-value points, but prefers nearby high-value points over distant ones.
+
+## Cost Function Contract
+
+```python
+def my_cost(origin, x, arguments=None):
+    """
+    Parameters
+    ----------
+    origin : np.ndarray, shape (V, D)
+        The current position (where we are now).
+    x : np.ndarray, shape (V, D)
+        The candidate destination positions.
+    arguments : dict or None
+        Optional extra parameters (passed via GPOptimizer constructor).
+    
+    Returns
+    -------
+    cost : np.ndarray, shape (V,)
+        Cost of moving from origin to each point in x.
+        Must be > 0. Higher cost = less desirable to visit.
+    """
+```
+
+**Key rules:**
+- Cost must be **positive** (> 0) — zero cost causes division by zero
+- Cost is used as a **divisor** — higher cost = lower effective acquisition score
+- The function receives batches of points, not single points
+
+## Recipes
+
+### L2 (Euclidean) Distance Cost
+Simple travel time proportional to straight-line distance:
+```python
+def l2_cost(origin, x, arguments=None):
+    """Cost proportional to Euclidean distance."""
+    offset = 1.0   # minimum cost (prevents div-by-zero, represents measurement time)
+    speed = 1.0     # cost per unit distance
+    distance = np.linalg.norm(x - origin, axis=1)
+    return offset + speed * distance
+```
+
+### L1 (Manhattan) Distance Cost
+For stage systems that move one axis at a time:
+```python
+def l1_cost(origin, x, arguments=None):
+    """Cost proportional to Manhattan distance (axis-by-axis motion)."""
+    offset = 1.0
+    speed = 1.0
+    distance = np.sum(np.abs(x - origin), axis=1)
+    return offset + speed * distance
+```
+
+### Anisotropic Cost
+Different axes have different speeds:
+```python
+def anisotropic_cost(origin, x, arguments=None):
+    """
+    Different cost per axis.
+    arguments["speeds"] = [speed_dim0, speed_dim1, ...]
+    """
+    offset = 1.0
+    speeds = arguments.get("speeds", np.ones(x.shape[1]))
+    weighted_dist = np.sum(np.abs(x - origin) * speeds, axis=1)
+    return offset + weighted_dist
+```
+
+### Directional Cost
+Moving in one direction is cheaper (e.g., gravity-assisted, or always-increasing scans):
+```python
+def directional_cost(origin, x, arguments=None):
+    """Cheaper to move in +x direction than -x."""
+    offset = 1.0
+    diff = x - origin
+    # Forward motion (positive) is cheap, backward is expensive
+    forward_cost = np.maximum(diff[:, 0], 0) * 1.0   # cost going forward
+    backward_cost = np.maximum(-diff[:, 0], 0) * 5.0  # 5x cost going backward
+    lateral_cost = np.sum(np.abs(diff[:, 1:]), axis=1) * 1.0
+    return offset + forward_cost + backward_cost + lateral_cost
+```
+
+### Settling Time Cost
+Fast moves need more settling time:
+```python
+def settling_cost(origin, x, arguments=None):
+    """
+    Short moves are fast; long moves need extra settling time.
+    Models: cost = base + travel + settling * (distance > threshold)
+    """
+    base = 1.0
+    travel_rate = 0.5
+    settle_time = 3.0
+    settle_threshold = 2.0  # distance above which settling kicks in
+    
+    distance = np.linalg.norm(x - origin, axis=1)
+    settling = np.where(distance > settle_threshold, settle_time, 0.0)
+    return base + travel_rate * distance + settling
+```
+
+### Zone-Based Cost
+Some regions of the parameter space are more expensive:
+```python
+def zone_cost(origin, x, arguments=None):
+    """
+    Higher cost to measure in certain zones.
+    E.g., cryogenic sample region requires cooldown.
+    """
+    base = 1.0
+    distance = np.linalg.norm(x - origin, axis=1)
+    
+    # Expensive zone: x[:, 0] > 8.0
+    zone_penalty = np.where(x[:, 0] > 8.0, 10.0, 0.0)
+    
+    return base + distance + zone_penalty
+```
+
+## Usage
+
+```python
+gpo = GPOptimizer(
+    x_data=x_data,
+    y_data=y_data,
+    cost_function=l2_cost,
+    # args={"speeds": np.array([1.0, 2.0])}  # for anisotropic_cost
+)
+```
+
+The cost function is automatically applied during `ask()` — no extra code needed in the experiment loop.
+
+## Parameterized Cost with `arguments`
+
+Pass parameters via the `args` dict at construction:
+
+```python
+gpo = GPOptimizer(
+    x_data=x_data,
+    y_data=y_data,
+    cost_function=anisotropic_cost,
+    args={"speeds": np.array([1.0, 3.0, 0.5])},
+)
+```
+
+The `arguments` parameter in the cost function receives this dict.
+
+## Cost Functions Do NOT Add Hyperparameters
+
+Unlike kernel/mean/noise functions, cost functions have **fixed parameters** — they are not optimized during training. If you need to tune cost parameters, do it manually or via the `arguments` dict.
+
+## Calibrating a Cost Function From Observed Moves
+
+If you don't know the cost parameters ahead of time, record observed costs during the experiment and fit `offset` / `slope` by nonlinear least squares. gpCAM ships templates (`_update_cost_function`, `update_l1_cost_function`, `update_l2_cost_function`) that use SciPy's `differential_evolution`:
+
+```python
+# Collect observations as a list of dicts during the experiment:
+observations = [
+    {"origin": np.array([0.0, 0.0]), "point": np.array([0.5, 0.2]), "cost": 2.7},
+    {"origin": np.array([0.5, 0.2]), "point": np.array([0.8, 0.9]), "cost": 3.4},
+    # ...
+]
+
+# Then fit:
+from gpcam.cost_functions import update_l2_cost_function   # or update_l1_cost_function
+new_args = update_l2_cost_function(observations, parameters={"offset": 1.0, "slope": 1.0})
+# Plug `new_args` back into the cost-function `arguments` dict.
+```
+
+The updater drops outliers beyond ±2σ of the per-move cost-per-distance before fitting, so a few anomalous moves (e.g. instrument glitches) won't distort the cost model.
+
+## Common Pitfalls
+
+1. **Zero cost**: Causes division by zero in acquisition. Always add a positive offset (minimum cost ≥ 1.0).
+2. **Cost too high**: If cost dominates, the optimizer just measures nearby points and never explores. Balance cost magnitude with acquisition scores.
+3. **Wrong shape**: Must return 1D array of length `V`, matching the number of candidate points.
+4. **Forgetting `origin`**: The cost depends on where you currently are, not just where you're going.
+5. **Not vectorized**: `origin` and `x` are batches — use numpy operations, not loops.
diff --git a/skills/experiment-designer/SKILL.md b/skills/experiment-designer/SKILL.md
new file mode 100644
index 0000000..b486415
--- /dev/null
+++ b/skills/experiment-designer/SKILL.md
@@ -0,0 +1,277 @@
+# Skill: gpCAM Experiment Designer
+
+Design complete autonomous experiment scripts using gpCAM. You translate a scientist's description of their measurement into a runnable Python script.
+
+## When to Use
+
+When a user wants to:
+- Set up an autonomous/smart scan or optimization
+- Replace a raster scan with adaptive sampling
+- Find optimal experimental conditions (peak finding, parameter optimization)
+- Explore a parameter space efficiently
+
+## Your Role
+
+You are helping **beamline scientists** who may not know GP math or the gpCAM API. Your job is to:
+1. Understand their experiment (what they measure, what they control, what they want to find)
+2. Generate a complete, well-commented Python script they can adapt
+3. Explain the key choices you made in plain language
+
+## Conversation Flow
+
+### Step 1: Understand the Experiment
+Ask about:
+- **Input dimensions**: What parameters do they control? (motor positions, temperature, voltage, etc.)
+- **Input bounds**: What range for each parameter?
+- **Output**: What do they measure? (intensity, spectrum, image, scalar?)
+- **Goal**: Exploration (map everything)? Optimization (find the peak)? Both?
+- **Constraints**: Any forbidden regions? Cost of moving between points?
+- **Prior knowledge**: Do they know roughly what to expect? (smooth? periodic? sharp features?)
+- **Data size**: How many measurements can they afford? (determines if gp2Scale is needed)
+- **Noise**: Is the measurement noisy? Does noise vary across the parameter space?
+
+### Step 2: Design Choices
+Based on their answers, decide:
+
+| Choice | Guidance |
+|--------|----------|
+| **Kernel** | Default Matérn-3/2 ARD is good for most cases. Use periodic kernel if periodicity is known. Use Matérn-1/2 for rough/discontinuous data, Matérn-5/2 or SE for very smooth. See `kernel-designer` skill for custom kernels. |
+| **Acquisition function** | `'variance'` for exploration/mapping. `'expected improvement'` or `'ucb'` for optimization (UCB exposes a tunable exploration/exploitation tradeoff via `beta`). Custom callable for multi-objective or constraints. See `acquisition-functions` skill. |
+| **Prior mean** | Zero (default) unless they have a physical model. See `prior-mean-functions` skill. |
+| **Noise model** | Use `noise_variances` if noise is known and uniform. Use `noise_function` if noise varies. See `noise-functions` skill. |
+| **Training strategy** | `method='global'` for first training, `method='local'` for re-training during the loop. Other options: `"mcmc"` (Bayesian — returns posterior samples over hyperparameters), `"adam"` (stochastic-gradient, fast, works well for high-dimensional hyperparameter vectors like deep kernels), `"hgdl"` (distributed local+global hybrid — needs a `dask_client`). |
+| **calc_inv** | `True` if <2000 points, `False` otherwise. |
+| **Number of initial points** | Rule of thumb: 5-10× the input dimensionality for initial random sampling. |
+| **Validation** | `gpo.rmse(x_test, y_true)` and `gpo.crps(x_test, y_true)` give RMSE and continuous ranked probability score on a held-out grid — call these after training to sanity-check the fit. |
+
+### Step 3: Generate the Script
+
+**Two paths:** If the scientist only needs a scalar black-box optimized and has no need to inspect or customize the ask/tell loop, use the one-shot `GPOptimizer.optimize()` shortcut (A). Otherwise use the full template (B). The full template is required when they want custom acquisition, mid-loop re-training with specific schedules, async training, checkpointing, validation plots during the run, or integration with a live instrument.
+
+#### A. One-shot optimize (simplest)
+
+```python
+import numpy as np
+from gpcam import GPOptimizer
+
+def f(x):
+    # x is shape (N, D); return (y, noise_variance) with matching shapes
+    y = np.sin(x[:, 0]) * np.cos(x[:, 1])
+    return y, np.full(len(x), 0.01)
+
+gpo = GPOptimizer()
+result = gpo.optimize(
+    func=f,
+    search_space=np.array([[0., 1.], [0., 1.]]),
+    max_iter=50,
+)
+```
+
+`optimize()` handles initial sampling, training schedule, the ask/tell loop, and termination. Use it when the scientist has a simulator or instrument wrapper they can hand gpCAM as a Python function. For fvGP/multi-task, pass `x_out=np.array([...])`.
+
+#### B. Full template (adaptive loop)
+
+Output a complete Python script with this structure:
+
+```python
+"""
+Autonomous Experiment: [description]
+Generated for gpCAM v8.3.x
+
+Input space: [dimensions and ranges]
+Output: [what is measured]
+Strategy: [exploration/optimization/hybrid]
+"""
+
+import numpy as np
+from gpcam import GPOptimizer
+
+# ============================================================
+# 1. EXPERIMENT PARAMETERS — EDIT THESE
+# ============================================================
+# Define the parameter space bounds
+# Each row: [min, max] for one dimension
+parameter_bounds = np.array([
+    [0.0, 10.0],   # motor_x (mm)
+    [0.0, 5.0],    # motor_y (mm)
+])
+parameter_names = ["motor_x", "motor_y"]
+
+N_INITIAL = 10      # Initial random measurements
+N_ITERATIONS = 50   # Adaptive measurements
+RETRAIN_EVERY = 10  # Re-train hyperparameters every N iterations
+
+# ============================================================
+# 2. YOUR MEASUREMENT FUNCTION — REPLACE THIS
+# ============================================================
+def measure(x):
+    """
+    Replace this with your actual measurement.
+    
+    Parameters
+    ----------
+    x : np.ndarray, shape (1, D)
+        The point to measure. x[0, 0] is motor_x, x[0, 1] is motor_y.
+    
+    Returns
+    -------
+    y : float
+        The measured value (scalar).
+    noise_variance : float or None
+        The estimated variance of this measurement, or None if unknown.
+    """
+    # EXAMPLE: replace with your instrument call
+    y = np.sin(x[0, 0]) * np.cos(x[0, 1])
+    noise_variance = 0.01  # or None
+    return y, noise_variance
+
+# ============================================================
+# 3. KERNEL (optional customization)
+# ============================================================
+# The default ARD Matérn-3/2 kernel is used if kernel_function=None.
+# Uncomment and modify to use a custom kernel.
+# from gpcam.kernels import matern_kernel_diff1, get_anisotropic_distance_matrix
+# 
+# def my_kernel(x1, x2, hyperparameters):
+#     d = get_anisotropic_distance_matrix(x1, x2, hyperparameters[1:])
+#     return hyperparameters[0] * matern_kernel_diff1(d, 1.0)
+
+kernel_function = None  # None = default ARD Matérn-3/2
+
+# ============================================================
+# 4. HYPERPARAMETER BOUNDS
+# ============================================================
+# For the default kernel: [signal_variance, length_scale_dim1, ..., length_scale_dimD]
+# Rule of thumb:
+#   signal_variance bounds: [0.01, 10 * std(y)]  (estimated after initial data)
+#   length_scale bounds:    [0.01, 10 * range(x_dim)]
+D = parameter_bounds.shape[0]
+hp_bounds = np.array(
+    [[0.001, 100.0]] +                                          # signal variance
+    [[0.01, 10.0 * (b[1] - b[0])] for b in parameter_bounds]   # length scales
+)
+
+# ============================================================
+# 5. ACQUISITION FUNCTION
+# ============================================================
+acquisition_function = "variance"  # Options: "variance", "expected improvement",
+                                   #          "ucb", "relative information entropy",
+                                   #          or a callable
+
+# ============================================================
+# 6. RUN THE EXPERIMENT
+# ============================================================
+def run():
+    # --- Initial random sampling ---
+    x_init = np.random.uniform(
+        parameter_bounds[:, 0], parameter_bounds[:, 1],
+        size=(N_INITIAL, D)
+    )
+    y_init = np.zeros(N_INITIAL)
+    noise_init = np.zeros(N_INITIAL)
+    
+    for i in range(N_INITIAL):
+        y_init[i], nv = measure(x_init[i:i+1])
+        noise_init[i] = nv if nv is not None else 0.0
+    
+    # --- Initialize GP ---
+    gpo = GPOptimizer(
+        x_data=x_init,
+        y_data=y_init,
+        noise_variances=noise_init if noise_init.any() else None,
+        kernel_function=kernel_function,
+        calc_inv=(N_INITIAL + N_ITERATIONS < 2000),
+    )
+    
+    # --- Initial training ---
+    gpo.train(hyperparameter_bounds=hp_bounds, method="global", max_iter=200)
+    
+    # --- Adaptive loop ---
+    for i in range(N_ITERATIONS):
+        # Ask: where should we measure next?
+        result = gpo.ask(
+            input_set=parameter_bounds,
+            acquisition_function=acquisition_function,
+        )
+        next_x = result["x"]
+        
+        # Measure
+        new_y, new_nv = measure(next_x)
+        
+        # Tell: update the GP
+        gpo.tell(
+            next_x,
+            np.array([new_y]),
+            noise_variances=np.array([new_nv]) if new_nv is not None else None,
+        )
+        
+        # Re-train periodically
+        if (i + 1) % RETRAIN_EVERY == 0:
+            gpo.train(hyperparameter_bounds=hp_bounds, method="local", max_iter=100)
+        
+        print(f"Iteration {i+1}/{N_ITERATIONS}: "
+              f"measured at {next_x[0]} -> {new_y:.4f}")
+    
+    # --- Results ---
+    data = gpo.get_data()
+    print(f"\nDone! Collected {len(data['x data'])} points total.")
+    print(f"Final hyperparameters: {data['hyperparameters']}")
+    
+    return gpo
+
+if __name__ == "__main__":
+    gpo = run()
+```
+
+## Key Rules
+
+1. **Always generate a complete, runnable script** — not fragments. Scientists should be able to copy-paste and run it.
+2. **The `measure()` function is a placeholder** — clearly mark it and explain what to replace.
+3. **Comment heavily** — explain every choice for the non-expert.
+4. **Hyperparameter bounds matter** — set sensible defaults based on the parameter ranges and expected signal scale.
+5. **Default kernel is usually fine** — only suggest custom kernels when there's a clear reason (known periodicity, symmetry, etc.).
+6. **Training schedule** — train globally once at the start, then locally every N iterations. Don't train every iteration (too slow).
+7. **Initial points** — always start with random initial sampling before the adaptive loop.
+
+## Hyperparameter Coordination
+
+This is critical and often the source of bugs:
+
+- The hyperparameter vector is shared across kernel, mean, and noise functions
+- For the default kernel with D input dimensions: `hps[0]` = signal variance, `hps[1:D+1]` = length scales
+- If you add a custom noise function that uses hyperparameters, those come AFTER the kernel hyperparameters
+- If you add a prior mean function that uses hyperparameters, document which indices it uses
+- **Always document the hyperparameter layout** in a comment at the top of the script
+- **Always set bounds for ALL hyperparameters** — the bounds array must match the total hyperparameter count
+
+Example with custom noise:
+```python
+# Hyperparameter layout:
+# hps[0]     = signal variance (kernel)
+# hps[1:D+1] = length scales (kernel)  
+# hps[D+1]   = noise amplitude (noise function)
+#
+# Total: D + 2 hyperparameters
+
+def my_noise(x, hps):
+    return np.full(len(x), hps[D+1]**2)
+
+hp_bounds = np.array(
+    [[0.001, 100.0]] +                                        # signal variance
+    [[0.01, 10.0 * (b[1]-b[0])] for b in parameter_bounds] +  # length scales
+    [[0.001, 10.0]]                                            # noise amplitude
+)
+```
+
+## Advanced Options (mention only if needed)
+
+- **Incremental data updates**: `gpo.tell(x_new, y_new, append=True)` adds points without overwriting; `append=False` replaces. Default is to replace — use `append=True` in a streaming instrument loop.
+- **Async training**: `opt_obj = gpo.train(..., asynchronous=True, method="hgdl", dask_client=client)` returns immediately; later call `gpo.update_hyperparameters(opt_obj)` to pull current best hyperparameters, and `gpo.stop_training(opt_obj)` to finish. Useful when training is expensive and the loop shouldn't block.
+- **Async ask**: `method="hgdlAsync"` starts a background search for the next point; the result dict contains an `opt_obj` you can `kill_client()` once you've used the suggestion.
+- **Checkpointing**: `GPOptimizer` instances are picklable — `pickle.dumps(gpo)` before a long run lets you reload state later.
+- **Info measures**: `gpo.gp_mutual_information(x_test)` and `gpo.gp_total_correlation(x_test)` report information content at a candidate set.
+- **User-function arguments**: `GPOptimizer(..., args={"a": 1.5, "b": 2.0})` plumbs a dict through to custom kernel/mean/noise/cost functions (they receive it as an extra argument when they declare it).
+
+## Reference
+
+For detailed API docs, kernel math, and advanced options, see the [gpCAM documentation](https://gpcam.readthedocs.io).
diff --git a/skills/gp2scale-advanced/SKILL.md b/skills/gp2scale-advanced/SKILL.md
new file mode 100644
index 0000000..94222dc
--- /dev/null
+++ b/skills/gp2scale-advanced/SKILL.md
@@ -0,0 +1,128 @@
+# Skill: gp2Scale — Large-Scale GPs
+
+Design experiments with tens of thousands to millions of data points using gpCAM's gp2Scale mode for exact GP computation at scale.
+
+## When to Use
+
+- Dataset will exceed ~10,000 points (and can go into the millions with enough cluster)
+- Need exact GP inference — gp2Scale is **not** an approximation like Vecchia or inducing-point methods; it only exploits naturally-occurring sparsity induced by compact-support kernels, so the result is exact
+- Have access to multiple CPU cores, GPUs, or a compute cluster
+- Willing to use compactly-supported kernels (Wendland)
+
+## Key Concepts
+
+gp2Scale uses:
+1. **Wendland kernels** with compact support → sparse covariance matrix (zero covariance beyond the support radius)
+2. **Dask distributed** for parallel covariance computation — one covariance block per worker
+3. **Sparse linear algebra** (LU, Cholesky) via SciPy / imate instead of dense
+4. **Random linear algebra** for log-determinants at scale (install `imate`)
+
+## Basic Setup
+
+```python
+from distributed import Client
+from gpcam import GPOptimizer
+
+# Start a local Dask cluster
+client = Client()                  # uses all available cores
+client.wait_for_workers(4)         # good practice: wait for workers before constructing
+
+gpo = GPOptimizer(
+    x_data=x_data,
+    y_data=y_data,
+    gp2Scale=True,
+    gp2Scale_dask_client=client,
+    gp2Scale_batch_size=500,       # typical; tune up for large clusters
+    init_hyperparameters=np.array([0.73, 0.0014]),  # signal var, length scale
+)
+
+gpo.train(hyperparameter_bounds=hps_bounds, max_iter=25, info=True)
+```
+
+## Kernel Requirement
+
+When `gp2Scale=True`, the kernel MUST produce a sparse matrix. The default switches to an anisotropic Wendland kernel automatically if no custom kernel is provided.
+
+If providing a custom kernel, it must have compact support:
+```python
+from gpcam.kernels import wendland_anisotropic
+
+def my_gp2scale_kernel(x1, x2, hps):
+    """Custom kernel with compact support for gp2Scale."""
+    return wendland_anisotropic(x1, x2, hps)
+```
+
+## Hyperparameters for Wendland Kernel
+
+```
+hps[0]     = signal variance
+hps[1:D+1] = per-dimension length scales (also control support radius)
+```
+
+The length scales in the Wendland kernel also determine the support radius — points further apart than the length scale have zero covariance.
+
+## Linear Algebra Modes
+
+| Mode | Description |
+|------|-------------|
+| `"Chol"` | Sparse Cholesky — fastest for moderate sparsity |
+| `"sparseLU"` | Sparse LU decomposition |
+| `"sparseCG"` | Conjugate gradient (iterative) |
+| `"sparseMINRES"` | MINRES (iterative) |
+
+## Custom Block-MCMC Training
+
+For expensive gp2Scale likelihoods, standard `method="mcmc"` training may be too slow. gpCAM exposes a block Metropolis-Hastings sampler you can drive directly against the GP's log-likelihood:
+
+```python
+import numpy as np
+from gpcam import gpMCMC, ProposalDistribution
+
+def in_bounds(v, bounds):
+    return not (any(v < bounds[:, 0]) or any(v > bounds[:, 1]))
+
+def prior_function(theta, args):
+    return 0.0 if in_bounds(theta, args["bounds"]) else -np.inf
+
+def log_likelihood(hps, args):
+    return gpo.log_likelihood(hyperparameters=hps)   # exposed on GPOptimizer
+
+pd = ProposalDistribution([0, 1], init_prop_Sigma=np.identity(2) * 0.01)
+
+mcmc = gpMCMC(log_likelihood, prior_function, [pd],
+              args={"bounds": hps_bounds})
+result = mcmc.run_mcmc(x0=np.array([1.0, 0.01]), n_updates=200, info=True)
+
+gpo.set_hyperparameters(result["mean(x)"])
+```
+
+`ProposalDistribution` takes the list of hyperparameter indices in that block and an initial proposal covariance. Stack multiple `ProposalDistribution`s for block-wise updates of high-dimensional hyperparameter vectors (deep kernels, etc.).
+
+## HPC Setup (SLURM example)
+
+```python
+from dask_jobqueue import SLURMCluster
+from distributed import Client
+
+cluster = SLURMCluster(
+    cores=32,
+    memory="64GB",
+    walltime="01:00:00",
+)
+cluster.scale(jobs=4)  # 4 nodes × 32 cores
+client = Client(cluster)
+```
+
+## Estimating Compute Time
+
+```python
+from gpcam.gp_optimizer import gp2Scale_time_estimate
+gp2Scale_time_estimate(n_workers=8, worker_speed=500, n_data=100000)
+```
+
+## Common Pitfalls
+
+1. **Using a non-sparse kernel**: gp2Scale won't speed up dense kernels.
+2. **Batch size too small**: More overhead. Start with 10000.
+3. **Forgetting Dask client**: A local client is created by default but explicit is better.
+4. **Length scales too large**: Reduces sparsity, defeating the purpose. Keep support radius reasonable.
diff --git a/skills/kernel-designer/SKILL.md b/skills/kernel-designer/SKILL.md
new file mode 100644
index 0000000..eda0ed0
--- /dev/null
+++ b/skills/kernel-designer/SKILL.md
@@ -0,0 +1,260 @@
+# Skill: gpCAM Kernel Designer
+
+Design custom kernel (covariance) functions for gpCAM that encode domain knowledge about the experiment.
+
+## When to Use
+
+When a user needs a kernel that goes beyond the default ARD Matérn-3/2:
+- Known periodicity in the data
+- Symmetry constraints (mirror, rotational)
+- Different smoothness in different dimensions
+- Combining multiple correlation structures (sum/product kernels)
+- Non-Euclidean input spaces (strings, graphs, categorical)
+- Non-stationary behavior (varying length scales)
+
+## Kernel Function Contract
+
+Every gpCAM kernel must satisfy:
+
+```python
+def my_kernel(x1, x2, hyperparameters):
+    """
+    Parameters
+    ----------
+    x1 : np.ndarray, shape (N1, D)
+    x2 : np.ndarray, shape (N2, D)
+    hyperparameters : np.ndarray, 1D
+    
+    Returns
+    -------
+    K : np.ndarray, shape (N1, N2)
+        Must be symmetric positive semi-definite.
+    """
+```
+
+- `x1` and `x2` are 2D arrays even for 1D inputs
+- The output must be an `(N1, N2)` matrix
+- The kernel must be symmetric: `k(x1, x2) = k(x2, x1).T`
+- The kernel must be positive semi-definite
+- Use vectorized numpy operations — avoid Python loops over data points
+
+## Building Blocks
+
+Import from `gpcam.kernels` (or define locally):
+
+```python
+from gpcam.kernels import (
+    matern_kernel_diff1,      # Matérn ν=3/2: once differentiable
+    matern_kernel_diff2,      # Matérn ν=5/2: twice differentiable
+    squared_exponential_kernel,  # RBF/SE: infinitely smooth
+    wendland_kernel,           # Compact support (for gp2Scale)
+    get_distance_matrix,       # Euclidean distance
+    get_anisotropic_distance_matrix,  # ARD distance
+)
+```
+
+These base kernels operate on **distance matrices**, not raw points. The pattern is:
+1. Compute a distance matrix from points
+2. Apply a base kernel to the distance matrix
+
+## Kernel Recipes
+
+### Standard Anisotropic (ARD) Kernel
+```python
+def anisotropic_matern(x1, x2, hps):
+    """
+    hps[0]: signal variance
+    hps[1:D+1]: per-dimension length scales
+    """
+    d = get_anisotropic_distance_matrix(x1, x2, hps[1:])
+    return hps[0] * matern_kernel_diff1(d, 1.0)
+```
+
+### Periodic Kernel
+For data with known periodicity (e.g., angular measurements, crystal lattice):
+```python
+def periodic_kernel(x1, x2, hps):
+    """
+    hps[0]: signal variance
+    hps[1]: length scale
+    hps[2]: period
+    """
+    d = np.abs(np.subtract.outer(x1[:, 0], x2[:, 0]))
+    return hps[0] * np.exp(-2.0 * np.sin(np.pi * d / hps[2])**2 / hps[1]**2)
+```
+
+### Periodic + Smooth (Product Kernel)
+Periodic in one dimension, smooth Matérn in others:
+```python
+def periodic_plus_smooth(x1, x2, hps):
+    """
+    hps[0]: signal variance
+    hps[1]: periodic length scale
+    hps[2]: period
+    hps[3:]: Matérn length scales for remaining dims
+    """
+    # Periodic in dim 0
+    d_periodic = np.abs(np.subtract.outer(x1[:, 0], x2[:, 0]))
+    k_periodic = np.exp(-2.0 * np.sin(np.pi * d_periodic / hps[2])**2 / hps[1]**2)
+    
+    # Matérn in remaining dims
+    d_other = get_anisotropic_distance_matrix(x1[:, 1:], x2[:, 1:], hps[3:])
+    k_other = matern_kernel_diff1(d_other, 1.0)
+    
+    return hps[0] * k_periodic * k_other
+```
+
+### Sum Kernel (Multiple Scales)
+Captures both coarse and fine structure:
+```python
+def multi_scale_kernel(x1, x2, hps):
+    """
+    hps[0]: variance of coarse component
+    hps[1]: length scale of coarse component
+    hps[2]: variance of fine component
+    hps[3]: length scale of fine component
+    """
+    d = get_distance_matrix(x1, x2)
+    k_coarse = hps[0] * matern_kernel_diff2(d, hps[1])  # smooth, long-range
+    k_fine = hps[2] * matern_kernel_diff1(d, hps[3])     # rougher, short-range
+    return k_coarse + k_fine
+```
+
+### Symmetry-Enforcing Kernel
+For data known to be symmetric about an axis:
+```python
+def symmetric_kernel_x(x1, x2, hps):
+    """Mirror symmetry about x=0 in the first dimension."""
+    x1_flip = x1.copy()
+    x1_flip[:, 0] = -x1_flip[:, 0]
+    x2_flip = x2.copy()
+    x2_flip[:, 0] = -x2_flip[:, 0]
+    
+    d = get_anisotropic_distance_matrix
+    k = lambda a, b: hps[0] * matern_kernel_diff1(d(a, b, hps[1:]), 1.0)
+    
+    return 0.25 * (k(x1, x2) + k(x1_flip, x2) + k(x1, x2_flip) + k(x1_flip, x2_flip))
+```
+
+### L1 (Manhattan) Distance Kernel
+Separable per dimension — each dimension contributes independently:
+```python
+def l1_kernel(x1, x2, hps):
+    """Product of per-dimension exponential kernels (L1 distance)."""
+    k = hps[0] * np.ones((len(x1), len(x2)))
+    for i in range(x1.shape[1]):
+        d_i = np.abs(np.subtract.outer(x1[:, i], x2[:, i]))
+        k *= np.exp(-d_i / hps[1 + i])
+    return k
+```
+
+### Non-Euclidean Input Spaces (strings, graphs, categorical)
+
+gpCAM accepts arbitrary Python objects as inputs — `x_data` can be a list of strings, graphs, molecules, etc. The only requirement is that your kernel computes a valid covariance between any two objects.
+
+```python
+from gpcam import GPOptimizer
+from gpcam.kernels import matern_kernel_diff1
+
+def string_distance(s1, s2):
+    diff = abs(len(s1) - len(s2))
+    common = min(len(s1), len(s2))
+    return diff + sum(a != b for a, b in zip(s1[:common], s2[:common]))
+
+def string_kernel(x1, x2, hps):
+    """x1, x2 are lists/sequences of strings; hps = [signal_var, length_scale]."""
+    d = np.array([[string_distance(a, b) for b in x2] for a in x1])
+    return hps[0] * matern_kernel_diff1(d, hps[1])
+
+x_data = ["hello", "world", "this", "is", "gpcam"]
+y_data = np.array([2.0, 1.9, 1.8, 3.0, 5.0])
+
+gp = GPOptimizer(x_data, y_data,
+                 init_hyperparameters=np.ones(2),
+                 kernel_function=string_kernel)
+gp.train(hyperparameter_bounds=np.array([[1e-3, 100.], [1e-3, 100.]]))
+
+# Predict on new objects:
+gp.posterior_mean(["full"])["m(x)"]
+
+# Ask which of a candidate set to measure next:
+gp.ask(["who", "could", "it", "be"], n=4)
+```
+
+Notes:
+- A Python `for` loop over objects is fine here (you're bottlenecked by the distance function, not numpy).
+- If the distance function is symmetric, the resulting kernel is automatically symmetric; you still need the base kernel (Matérn/SE) to make it PSD.
+- For multi-task on non-Euclidean inputs: `fvGPOptimizer(x_strings, y_multi, kernel_function=...)` works the same way.
+
+### Deep Kernel (NN-warped input space)
+
+For hard multi-task structure or learned metrics, warp the input through a small neural net and then apply a stationary kernel in the warped space. `gpcam.deep_kernel_network.Network` gives you an MLP whose weights you read out of `hps`:
+
+```python
+from gpcam.deep_kernel_network import Network
+from gpcam.kernels import get_distance_matrix, matern_kernel_diff1
+
+iset_dim = 3
+layer_width = 5
+n = Network(iset_dim, layer_width)
+# n.number_of_hps tells you how many NN hyperparameters to reserve.
+
+def deep_kernel(x1, x2, hps):
+    signal_var, length_scale = hps[0], hps[1]
+    # unpack the remaining hps into n.set_weights / n.set_biases (layout: see manual)
+    x1_nn = n.forward(x1)
+    x2_nn = n.forward(x2)
+    d = get_distance_matrix(x1_nn, x2_nn)
+    return signal_var * matern_kernel_diff1(d, length_scale)
+```
+
+Hyperparameter layout: `hps[0]=signal_var`, `hps[1]=length_scale`, `hps[2:]` = flattened NN weights+biases in the order `Network` expects. Use `method="mcmc"` or `method="adam"` for training; `global`/`local` don't scale well to NN-sized hyperparameter vectors.
+
+## Smoothness Guide
+
+| Kernel | Smoothness | When to Use |
+|--------|-----------|-------------|
+| Matérn-1/2 (exponential) | Rough, continuous but not differentiable | Sharp peaks, discontinuities |
+| Matérn-3/2 | Once differentiable | **Default choice** — most physical data |
+| Matérn-5/2 | Twice differentiable | Smoother physical signals |
+| Squared Exponential (RBF) | Infinitely smooth | Very smooth data; tends to oversmooth |
+
+## Hyperparameter Coordination
+
+**Critical:** When designing a custom kernel, you must:
+
+1. **Document the hyperparameter layout** — which index maps to what
+2. **Set matching bounds** — the `hyperparameter_bounds` array must have one row per hyperparameter
+3. **Coordinate with noise/mean functions** — all three share the same hyperparameter vector
+
+```python
+# ALWAYS include a comment block like this:
+#
+# Hyperparameter layout for my_kernel:
+# hps[0]     = signal variance          bounds: [0.001, 100]
+# hps[1]     = length scale dim 0       bounds: [0.01, range_dim0 * 10]
+# hps[2]     = length scale dim 1       bounds: [0.01, range_dim1 * 10]
+# hps[3]     = period (if periodic)     bounds: [expected_period * 0.5, expected_period * 2]
+```
+
+### Setting Initial Hyperparameters
+- **Signal variance**: start near `np.var(y_data)` or `1.0`
+- **Length scales**: start near `0.1 * range(x_dim)` — not too small (overfitting) or too large (underfitting)
+- **Period**: start near the expected period if known
+
+### Setting Bounds
+- **Signal variance**: `[0.001, 10 * np.std(y_data)]`
+- **Length scales**: `[0.01, 10 * range(x_dim)]` — lower bound prevents overfitting
+- **Period**: `[0.5 * expected, 2 * expected]` — tight if well-known
+
+## Common Pitfalls
+
+1. **Non-PSD kernel**: Sums and products of valid kernels are valid. Differences are NOT guaranteed PSD.
+2. **Python loops over data**: Use `np.subtract.outer` and vectorized ops. A double for-loop over N points is O(N²) in Python — unusable for >100 points.
+3. **Forgetting signal variance**: Always include a leading amplitude hyperparameter (`hps[0] * ...`).
+4. **Length scale = 0**: Causes division by zero. Set lower bounds > 0 (e.g., 0.001).
+5. **Mismatched hyperparameter count**: The bounds array rows must equal the total hyperparameter vector length.
+
+## Reference
+
+See `gpcam/kernels.py` (re-exports the fvGP kernel library) for the full library of kernel building blocks and the [gpCAM documentation](https://gpcam.readthedocs.io) for mathematical details.
diff --git a/skills/multi-task-advanced/SKILL.md b/skills/multi-task-advanced/SKILL.md
new file mode 100644
index 0000000..54a9520
--- /dev/null
+++ b/skills/multi-task-advanced/SKILL.md
@@ -0,0 +1,124 @@
+# Skill: Multi-Task GPs with fvGPOptimizer
+
+Design experiments with vector-valued or function-valued outputs using gpCAM's multi-task GP.
+
+## When to Use
+
+- Measuring a spectrum (many output channels per input point)
+- Multiple correlated outputs (e.g., intensity at different energies)
+- Exploiting correlations between tasks to improve predictions
+
+## Key Concept
+
+In fvGP, a multi-task GP is a single-task GP over the **Cartesian product** of input space × output space. The output dimension is appended as an extra column to the input.
+
+For example, with 2D input and 3 output channels, a point looks like:
+```
+[x0, x1, task_id]  where task_id ∈ {0, 1, 2}
+```
+
+This means your kernel must handle D+1 dimensional inputs, where the last dimension is the task index.
+
+## Basic Setup
+
+```python
+import numpy as np
+from gpcam import fvGPOptimizer
+
+# 100 input points, 5 output channels
+x_data = np.random.uniform(0, 1, (100, 2))  # 2D input
+y_data = np.random.randn(100, 5)             # 5 outputs per point
+
+# Default path — uses a built-in deep kernel, no hyperparameter bounds required:
+gpo = fvGPOptimizer(x_data, y_data)
+gpo.train(max_iter=20)
+
+# Custom kernel path — supply init_hyperparameters and hp bounds as with GPOptimizer:
+gpo = fvGPOptimizer(
+    x_data=x_data,
+    y_data=y_data,
+    init_hyperparameters=np.ones(4) / 10.0,
+    kernel_function=my_multi_task_kernel,
+)
+gpo.train(hyperparameter_bounds=np.array([
+    [0.01, 10.0],  # signal variance
+    [0.01, 10.0],  # length scale dim 0
+    [0.01, 10.0],  # length scale dim 1
+    [0.01, 10.0],  # length scale for task dimension
+]))
+```
+
+### Predictions and ask — the `x_out` argument
+
+Multi-task prediction methods take `x_out`, an array of task indices you want predictions for:
+
+```python
+# Predict all 5 task outputs at a grid of input points:
+mean = gpo.posterior_mean(x_grid, x_out=np.array([0, 1, 2, 3, 4]))["m(x)"]  # shape (N, 5)
+std  = np.sqrt(gpo.posterior_covariance(x_grid, x_out=np.array([0, 1, 2, 3, 4]))["v(x)"])
+
+# Ask for the next best input point across all tasks:
+gpo.ask(parameter_bounds, x_out=np.array([0, 1, 2, 3, 4]), n=1)
+
+# Ask for a batch of 4 points using a batch-aware acquisition:
+gpo.ask(parameter_bounds, x_out=np.array([0, 1]), n=4,
+        acquisition_function="relative information entropy set", vectorized=True)
+```
+
+### One-shot optimize
+
+For simple black-box vector-valued optimization, `optimize()` replaces the manual loop:
+```python
+def f(x):                      # x shape (N, D)
+    y = np.column_stack([...])  # shape (N, T)
+    noise = np.full(y.shape, 0.01)
+    return y, noise
+
+result = fvGPOptimizer().optimize(
+    func=f,
+    x_out=np.array([0, 1]),     # which task indices to treat as outputs
+    search_space=np.array([[0., 1.]]),
+    max_iter=50,
+)
+```
+
+## Multi-Task Kernel Design
+
+The kernel receives inputs with the task index as the last column. You need to model both within-task and between-task correlations:
+
+```python
+from gpcam.kernels import matern_kernel_diff1, get_distance_matrix
+
+def multi_task_kernel(x1, x2, hps):
+    """
+    x1, x2: shape (N, D+1) where last column is task index
+    hps[0]: signal variance
+    hps[1:D+1]: input space length scales
+    hps[D+1]: task correlation strength
+    """
+    # Spatial kernel (input dimensions only)
+    d_spatial = get_distance_matrix(x1[:, :-1], x2[:, :-1])
+    k_spatial = matern_kernel_diff1(d_spatial, hps[1])
+    
+    # Task kernel (last dimension)
+    task1 = x1[:, -1]
+    task2 = x2[:, -1]
+    # Simple: same task = 1, different task = correlation
+    same_task = np.equal.outer(task1, task2).astype(float)
+    k_task = same_task + hps[2] * (1 - same_task)
+    
+    return hps[0] * k_spatial * k_task
+```
+
+## Important Notes
+
+1. **Multi-task acquisition**: Use `"relative information entropy set"` / `"relative information entropy"` / `"variance"` / `"total correlation"` for batch acquisition across tasks. Pass `x_out=...` to the `ask()` call. Custom callables are supported and often advisable when you care about a specific task or combination.
+2. **Missing task observations**: `y_data` can have `np.nan` entries (e.g., task 1 wasn't measured at some x); the corresponding `noise_variances` entry **must also** be `np.nan`. The GP just ignores those entries — no imputation needed.
+3. **Default kernel**: `fvGPOptimizer(x, y)` with no kernel uses a built-in deep kernel that learns its own hyperparameters and doesn't require bounds. If you supply a custom `kernel_function`, you become responsible for the full `init_hyperparameters` + `hyperparameter_bounds` layout.
+4. **Deep kernel via NN warping**: For harder multi-task structure, `from gpcam.deep_kernel_network import Network` gives you an MLP you parametrize from `hps` — see the `kernel-designer` skill for the pattern.
+
+## Common Pitfalls
+
+1. **Forgetting the task dimension**: The kernel sees D+1 columns, not D.
+2. **Noise shape**: `noise_variances` is shape `(N, No)` for multi-task, not `(N,)`.
+3. **Scaling**: N data points × No outputs = N×No rows in the internal GP. Gets large fast.
diff --git a/skills/noise-functions/SKILL.md b/skills/noise-functions/SKILL.md
new file mode 100644
index 0000000..32115bc
--- /dev/null
+++ b/skills/noise-functions/SKILL.md
@@ -0,0 +1,145 @@
+# Skill: gpCAM Noise Functions
+
+Design custom noise models for experiments with non-uniform or structured noise.
+
+## When to Use
+
+- Noise varies across the parameter space (e.g., edges of detector have more noise)
+- Noise depends on signal intensity (Poisson/shot noise)
+- You want the noise level to be learned as a hyperparameter
+- Correlated noise between measurements
+
+## Noise Function Contract
+
+```python
+def my_noise(x, hyperparameters):
+    """
+    Parameters
+    ----------
+    x : np.ndarray, shape (N, D)
+        Input positions.
+    hyperparameters : np.ndarray, 1D
+        The FULL hyperparameter vector (shared with kernel and mean).
+    
+    Returns
+    -------
+    noise : np.ndarray
+        Either shape (N,) for diagonal noise (independent per point),
+        or shape (N, N) for full noise covariance matrix.
+    """
+```
+
+## When to Use What
+
+| Scenario | Approach |
+|----------|----------|
+| Known, uniform noise | Use `noise_variances=np.full(N, sigma**2)` — no noise function needed |
+| Known per-point noise | Use `noise_variances=my_array` — no noise function needed |
+| Unknown uniform noise | Use a noise function with a learnable hyperparameter |
+| Position-dependent noise | Use a noise function that depends on `x` |
+| No noise info at all | Don't provide either — gpCAM defaults to `(0.01 * mean|y|)²` |
+
+## Recipes
+
+### Learnable Constant Noise
+The noise level is a hyperparameter that gets optimized:
+```python
+def learnable_noise(x, hps):
+    """hps[K] = noise standard deviation (learned)."""
+    K = 3  # INDEX WHERE NOISE HP STARTS — adjust for your kernel
+    return np.full(len(x), hps[K]**2)  # return VARIANCE, not std
+```
+
+### Position-Dependent Noise
+More noise at edges of the measurement range:
+```python
+def edge_noise(x, hps):
+    """Higher noise near boundaries."""
+    K = 3
+    base_noise = hps[K]**2
+    # Increase noise near edges (within 10% of range)
+    center = np.mean(parameter_bounds, axis=1)
+    half_range = (parameter_bounds[:, 1] - parameter_bounds[:, 0]) / 2
+    dist_from_center = np.abs(x - center) / half_range  # 0 at center, 1 at edge
+    edge_factor = 1.0 + 5.0 * np.max(dist_from_center, axis=1)**2
+    return base_noise * edge_factor
+```
+
+### Poisson-Like (Signal-Dependent) Noise
+Common in photon-counting detectors:
+```python
+def poisson_noise(x, hps):
+    """
+    Noise proportional to sqrt of expected signal.
+    Uses the GP posterior mean as estimate of signal.
+    
+    NOTE: This creates a feedback loop — use carefully.
+    hps[K] = noise scale factor
+    """
+    K = 3
+    # Can't call posterior_mean here directly (circular dependency)
+    # Instead, use a fixed estimate or pass through args
+    # Simple version: just scale with position
+    return np.full(len(x), hps[K]**2)
+```
+
+### Two-Level Noise (Different Detectors/Modes)
+```python
+def two_detector_noise(x, hps):
+    """
+    Different noise for two measurement modes.
+    Assumes last dimension of x encodes the mode (0 or 1).
+    """
+    K = 3
+    noise = np.empty(len(x))
+    mode_0 = x[:, -1] < 0.5
+    mode_1 = ~mode_0
+    noise[mode_0] = hps[K]**2      # detector 1
+    noise[mode_1] = hps[K+1]**2    # detector 2
+    return noise
+```
+
+## Hyperparameter Coordination
+
+Noise function hyperparameters come from the **same vector** as kernel and mean hyperparameters.
+
+```python
+# Example layout:
+# hps[0]     = signal variance     (kernel)
+# hps[1:3]   = length scales       (kernel)
+# hps[3]     = noise std dev       (noise function) ← YOUR NOISE HP
+#
+# Total: 4 hyperparameters
+
+hp_bounds = np.array([
+    [0.001, 100.0],  # signal variance
+    [0.01, 50.0],    # length scale dim 0
+    [0.01, 50.0],    # length scale dim 1
+    [0.001, 10.0],   # noise std dev
+])
+```
+
+### Choosing Noise Bounds
+- **Lower bound**: Never 0 — use `0.001` minimum (prevents singular matrices)
+- **Upper bound**: `10 * std(y_data)` — noise shouldn't be larger than the signal
+- **Initial value**: `0.01 * mean(|y_data|)` (gpCAM's own default)
+
+## Important: noise_variances vs noise_function
+
+**Do not provide both.** Use one or the other:
+
+```python
+# Option A: fixed known noise
+gpo = GPOptimizer(x_data, y_data, noise_variances=np.full(N, 0.01))
+
+# Option B: learnable noise function
+gpo = GPOptimizer(x_data, y_data, noise_function=learnable_noise)
+```
+
+## Common Pitfalls
+
+1. **Returning std instead of variance**: The noise function must return **variance** (σ²), not standard deviation (σ).
+2. **Zero noise**: Causes singular matrix errors. Always ensure noise > 0.
+3. **Providing both**: Don't pass `noise_variances` AND `noise_function` — pick one.
+4. **Forgetting bounds**: The noise hyperparameter needs bounds in the `hyperparameter_bounds` array.
+5. **Wrong shape**: Return shape `(N,)` for independent noise, `(N, N)` for correlated.
diff --git a/skills/prior-mean-functions/SKILL.md b/skills/prior-mean-functions/SKILL.md
new file mode 100644
index 0000000..cb6bab7
--- /dev/null
+++ b/skills/prior-mean-functions/SKILL.md
@@ -0,0 +1,148 @@
+# Skill: gpCAM Prior Mean Functions
+
+Design prior mean functions that encode known physics or expected trends.
+
+## When to Use
+
+When the user has prior knowledge about the expected behavior:
+- A known baseline or background (linear trend, polynomial)
+- Physical model (Gaussian peak, Lorentzian, Bragg's law)
+- Expected shape from previous experiments
+- The signal is known to be non-zero on average
+
+## Prior Mean Function Contract
+
+```python
+def my_mean(x, hyperparameters):
+    """
+    Parameters
+    ----------
+    x : np.ndarray, shape (N, D)
+        Input positions.
+    hyperparameters : np.ndarray, 1D
+        The FULL hyperparameter vector (shared with kernel and noise).
+    
+    Returns
+    -------
+    m : np.ndarray, shape (N,)
+        Prior mean value at each input point.
+    """
+```
+
+**If no prior mean is provided**, gpCAM uses the average of `y_data` as a constant mean — this is fine for most cases.
+
+## When NOT to Use a Prior Mean
+
+- You don't have a strong physical expectation → use default (mean of data)
+- Your model might be wrong → a bad prior mean biases the GP and can hurt more than help
+- You're purely exploring → the GP will learn the mean from data
+
+## Recipes
+
+### Constant Mean (explicit)
+```python
+def constant_mean(x, hps):
+    """Explicit constant mean. hps[-1] = mean value."""
+    return np.full(len(x), hps[-1])
+```
+
+### Linear Trend
+```python
+def linear_mean(x, hps):
+    """
+    Linear prior mean: m(x) = a + b0*x0 + b1*x1 + ...
+    Uses hps[K], hps[K+1], ..., hps[K+D] where K is where mean hps start.
+    """
+    K = 3  # INDEX WHERE MEAN HYPERPARAMETERS START — adjust for your kernel
+    D = x.shape[1]
+    intercept = hps[K]
+    slopes = hps[K+1:K+1+D]
+    return intercept + x @ slopes
+```
+
+### Gaussian Peak (known approximate location)
+```python
+def gaussian_peak_mean(x, hps):
+    """
+    Prior: expect a Gaussian peak near a known location.
+    hps[K]: amplitude
+    hps[K+1]: center_x
+    hps[K+2]: center_y
+    hps[K+3]: width
+    """
+    K = 3  # adjust
+    amp = hps[K]
+    cx, cy = hps[K+1], hps[K+2]
+    w = hps[K+3]
+    r2 = (x[:, 0] - cx)**2 + (x[:, 1] - cy)**2
+    return amp * np.exp(-r2 / (2 * w**2))
+```
+
+### Polynomial Background
+```python
+def quadratic_mean(x, hps):
+    """Quadratic background: a + b*x + c*x^2 (1D)."""
+    K = 2  # adjust
+    return hps[K] + hps[K+1] * x[:, 0] + hps[K+2] * x[:, 0]**2
+```
+
+### Physics-Informed (Bragg's Law Example)
+```python
+def bragg_mean(x, hps):
+    """
+    Prior mean based on Bragg's law: peak at 2*d*sin(theta) = n*lambda.
+    x[:, 0] = 2theta angle in degrees
+    hps[K] = amplitude
+    hps[K+1] = d-spacing estimate
+    """
+    K = 3
+    wavelength = 1.54  # Cu K-alpha, fixed
+    two_theta = np.radians(x[:, 0])
+    # Expected peak positions
+    d = hps[K+1]
+    expected = hps[K] * np.exp(-(np.sin(two_theta/2) - wavelength/(2*d))**2 / 0.001)
+    return expected
+```
+
+## Hyperparameter Coordination
+
+**This is where most bugs happen.** The prior mean function receives the same hyperparameter vector as the kernel and noise functions. You must:
+
+1. Decide which indices the mean function uses
+2. Add bounds for those hyperparameters
+3. Document the full layout
+
+```python
+# Example layout:
+# hps[0]     = signal variance     (kernel)
+# hps[1:3]   = length scales       (kernel, 2D input)
+# hps[3]     = intercept           (mean function)
+# hps[4:6]   = slopes              (mean function)
+# hps[6]     = noise amplitude     (noise function)
+#
+# Total: 7 hyperparameters
+
+hp_bounds = np.array([
+    [0.001, 100.0],   # signal variance
+    [0.01, 50.0],     # length scale dim 0
+    [0.01, 50.0],     # length scale dim 1
+    [-10.0, 10.0],    # intercept
+    [-5.0, 5.0],      # slope dim 0
+    [-5.0, 5.0],      # slope dim 1
+    [0.001, 10.0],    # noise amplitude
+])
+```
+
+## Setting Mean Function Hyperparameter Bounds
+
+- **Intercept/amplitude**: `[min(y_data) * 2, max(y_data) * 2]`
+- **Slopes**: `[-range(y)/range(x), +range(y)/range(x)]` 
+- **Peak position**: `[known_position - tolerance, known_position + tolerance]`
+- **Width**: `[min_expected_width, max_expected_width]`
+
+## Common Pitfalls
+
+1. **Wrong hyperparameter indices**: Double-check which indices the mean function reads — they must not overlap with kernel/noise indices.
+2. **Overconfident prior**: If the mean function is too specific and wrong, the GP will fight between data and prior. Keep it loose.
+3. **Forgetting to add bounds**: Every hyperparameter used by the mean function needs a row in `hyperparameter_bounds`.
+4. **Return shape**: Must return 1D array of length `len(x)`, not a scalar.