Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions docs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,15 @@ The gallery below presents examples that demonstrate the use of Simuk.

+++
SBC

.. grid-item-card::
:link: ./examples/gallery/posterior_sbc.html
:text-align: center
:shadow: none
:class-card: example-gallery

.. image:: examples/img/posterior_sbc.png
:alt: Posterior SBC

+++
Posterior SBC
157 changes: 157 additions & 0 deletions docs/examples/gallery/posterior_sbc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
---
jupytext:
text_representation:
extension: .md
format_name: myst
kernelspec:
display_name: Python 3
language: python
name: python3
---

# Posterior simulation based calibration

```{jupyter-execute}

from arviz_plots import plot_ecdf_pit, style
import numpy as np
import simuk
style.use("arviz-variat")
```

This example demonstrates how to use the `SBC` class for posterior simulation-based calibration (SBC), supporting PyMC, Bambi and Numpyro models. In this version of SBC, we aim to validate the inference conditioned on the observed data and we restrict the analysis to space of parameters supported by the posterior distribution.

::::::{tab-set}
:class: full-width

:::::{tab-item} PyMC
:sync: pymc_default

First, define a PyMC model and sample from the posterior distribution. In this example, we will use the centered eight schools model.

```{jupyter-execute}

import pymc as pm

data = np.array([28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0])
sigma = np.array([15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0])

with pm.Model() as centered_eight:
mu = pm.Normal('mu', mu=0, sigma=5)
tau = pm.HalfCauchy('tau', beta=5)
theta = pm.Normal('theta', mu=mu, sigma=tau, shape=8)
y_obs = pm.Normal('y', mu=theta, sigma=sigma, observed=data)
trace = pm.sample(1000)
```

Pass the model and the trace to the `SBC` class, set the number of simulations to 100, and run the simulations. Parameters will be drawn from the provided trace (which are from the posterior distribution). If the trace is not provided, the model will be sampled from the prior distribution (prior SBC). This process may take some time since the model runs multiple times (100 in this example).

```{jupyter-execute}

sbc = simuk.SBC(centered_eight,
num_simulations=100,
trace=trace,
sample_kwargs={'draws': 25, 'tune': 50})

sbc.run_simulations();
```

We compare the posterior distribution (conditional on our observed data) and the posterior distribution conditional on the data and the simulated data using the ArviZ function `plot_ecdf_pit`. We expect a uniform distribution; the gray envelope corresponds to the 94% credible interval.

```{jupyter-execute}

plot_ecdf_pit(sbc.simulations,
visuals={"xlabel":False},
);
```

:::::

:::::{tab-item} Bambi
:sync: bambi_default

Now, we define a Bambi Model and sample from the posterior distribution.

```{jupyter-execute}

import bambi as bmb
import pandas as pd

x = np.random.normal(0, 1, 200)
y = 2 + np.random.normal(x, 1)
df = pd.DataFrame({"x": x, "y": y})
bmb_model = bmb.Model("y ~ x", df)
trace = bmb_model.fit(num_samples=25, tune=50)
```

Pass the model and the trace to the `SBC` class, set the number of simulations to 100, and run the simulations.
Parameters will be drawn from the provided trace (which are from the posterior distribution). If the trace is not provided, the model will be sampled from the prior distribution (prior SBC). This process may take some time, as the model runs multiple times.

```{jupyter-execute}

sbc = simuk.SBC(bmb_model,
num_simulations=100,
trace=trace,
sample_kwargs={'draws': 25, 'tune': 50})

sbc.run_simulations();
```

We compare the posterior distribution (conditional on our observed data) and the posterior distribution conditional on the data and the simulated data using the ArviZ function `plot_ecdf_pit`. We expect a uniform distribution; the gray envelope corresponds to the 94% credible interval.


```{jupyter-execute}
plot_ecdf_pit(sbc.simulations)
```

:::::

:::::{tab-item} Numpyro
:sync: numpyro_default

We define a Numpyro Model, we use the centered eight schools model.

```{jupyter-execute}
import numpyro
import numpyro.distributions as dist
from jax import random
from numpyro.infer import MCMC, NUTS

y = np.array([28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0])
sigma = np.array([15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0])

def eight_schools_cauchy_prior(J, sigma, y=None):
mu = numpyro.sample("mu", dist.Normal(0, 5))
tau = numpyro.sample("tau", dist.HalfCauchy(5))
with numpyro.plate("J", J):
theta = numpyro.sample("theta", dist.Normal(mu, tau))
numpyro.sample("y", dist.Normal(theta, sigma), obs=y)
```

We obtain samples from the posterior by running MCMC.
```{jupyter-execute}
# We use the NUTS sampler
nuts_kernel = NUTS(eight_schools_cauchy_prior)
mcmc = MCMC(nuts_kernel, num_warmup=500, num_samples=1000)
mcmc.run(random.PRNGKey(0), J=8, sigma=sigma, y=y)
```

Pass the model and the `mcmc` to the `SBC` class, set the number of simulations to 100, and run the simulations. For numpyro model, we pass in the ``data_dir`` parameter.

```{jupyter-execute}
sbc = simuk.SBC(nuts_kernel,
sample_kwargs={"num_warmup": 50, "num_samples": 75},
trace=mcmc,
num_simulations=100,
data_dir={"J": 8, "sigma": sigma, "y": y},
)
sbc.run_simulations()
```

We compare the posterior distribution (conditional on our observed data) and the posterior distribution conditional on the data and the simulated data using the ArviZ function `plot_ecdf_pit`. We expect a uniform distribution; the gray envelope corresponds to the 94% credible interval.

```{jupyter-execute}
plot_ecdf_pit(sbc.simulations,
visuals={"xlabel":False},
);
```
98 changes: 93 additions & 5 deletions docs/examples/gallery/sbc.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,6 @@ kernelspec:

# Simulation based calibration

This example demonstrates how to use the `SBC` class for simulation-based calibration, supporting both PyMC and Bambi models.

```{jupyter-execute}

from arviz_plots import plot_ecdf_pit, style
Expand All @@ -21,11 +19,15 @@ import simuk
style.use("arviz-variat")
```

## Out-of-the-box SBC
This example demonstrates how to use the `SBC` class for simulation-based calibration, supporting PyMC, Bambi and Numpyro models. By default, the generative model implied by the probabilistic model is used.


::::::{tab-set}
:class: full-width

:::::{tab-item} PyMC
:sync: pymc
:sync: pymc_default

First, define a PyMC model. In this example, we will use the centered eight schools model.

Expand Down Expand Up @@ -69,7 +71,7 @@ plot_ecdf_pit(sbc.simulations,
:::::

:::::{tab-item} Bambi
:sync: bambi
:sync: bambi_default

Now, we define a Bambi Model.

Expand Down Expand Up @@ -106,7 +108,7 @@ plot_ecdf_pit(sbc.simulations)
:::::

:::::{tab-item} Numpyro
:sync: numpyro
:sync: numpyro_default

We define a Numpyro Model, we use the centered eight schools model.

Expand Down Expand Up @@ -150,3 +152,89 @@ plot_ecdf_pit(sbc.simulations,
visuals={"xlabel":False},
);
```

:::::

::::::

## Custom simulator SBC

::::::{tab-set}
:class: full-width

:::::{tab-item} PyMC
:sync: pymc_custom

In certain scenarios, you might want to pass a custom function to the `SBC` class to generate the data. For instance, if you aim to evaluate the effect of model misspecification by generating data from a different model than the one used for model fitting.

Next, we determine the impact of occasional large deviations (outliers) by drawing from a Laplace distribution instead of a normal distribution (which we use to fit the model).

```{jupyter-execute}
def simulator(theta, seed, **kwargs):
rng = np.random.default_rng(seed)
# Here we use a Laplace distribution, but it could also be some mechanistic simulator
scale = sigma / np.sqrt(2)
return {"y": rng.laplace(theta, scale)}

sbc = simuk.SBC(centered_eight,
num_simulations=100,
simulator=simulator,
sample_kwargs={'draws': 25, 'tune': 50})

sbc.run_simulations();
```

:::::

:::::{tab-item} Bambi
:sync: bambi_custom

In certain scenarios, you might want to pass a custom function to the `SBC` class to generate the data. For instance, if you aim to evaluate the effect of model misspecification by generating data from a different model than the one used for model fitting.

Next, we determine the impact of occasional large deviations (outliers) by drawing from a Laplace distribution instead of a normal distribution (which we use to fit the model).

```{jupyter-execute}
def simulator(mu, seed, sigma, **kwargs):
rng = np.random.default_rng(seed)
# Here we use a Laplace distribution, but it could also be some mechanistic simulator
scale = sigma / np.sqrt(2)
return {"y": rng.laplace(mu, scale)}

sbc = simuk.SBC(bmb_model,
num_simulations=100,
simulator=simulator,
sample_kwargs={'draws': 25, 'tune': 50})

sbc.run_simulations();
```

:::::


:::::{tab-item} Numpyro
:sync: numpyro_custom

In certain scenarios, you might want to pass a custom function to the `SBC` class to generate the data. For instance, if you aim to evaluate the effect of model misspecification by generating data from a different model than the one used for model fitting.

Next, we determine the impact of occasional large deviations (outliers) by drawing from a Laplace distribution instead of a normal distribution (which we use to fit the model).

```{jupyter-execute}
def simulator(theta, seed, **kwargs):
rng = np.random.default_rng(seed)
# Here we use a Laplace distribution, but it could also be some mechanistic simulator
scale = sigma / np.sqrt(2)
return {"y": rng.laplace(theta, scale)}

sbc = simuk.SBC(nuts_kernel,
sample_kwargs={"num_warmup": 50, "num_samples": 75},
num_simulations=100,
simulator=simulator,
data_dir={"J": 8, "sigma": sigma, "y": y}
)

sbc.run_simulations();
```

:::::

::::::
Binary file added docs/examples/img/posterior_sbc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading