Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
7d5249e
docs: add Tier 1 extensions design for Impulso
thomaspinder Mar 8, 2026
c0a854b
docs: add Tier 1 extensions implementation plan
thomaspinder Mar 8, 2026
97b2939
test: add tests for VARData.with_dummy_observations()
thomaspinder Mar 8, 2026
70cd5d5
feat: implement VARData.with_dummy_observations()
thomaspinder Mar 8, 2026
4134851
fix: add exog test coverage and n_lags docstring for dummy obs
thomaspinder Mar 8, 2026
72a6e4c
test: add tests for ConjugateVAR
thomaspinder Mar 8, 2026
4e47925
feat: add ConjugateVAR with direct NIW posterior sampling
thomaspinder Mar 8, 2026
dc657c8
refactor: extract shared NIW param computation, use conftest fixture
thomaspinder Mar 8, 2026
7de236f
test: add tests for GLP hierarchical prior selection
thomaspinder Mar 8, 2026
da26f5d
test: add tests for long-run Blanchard-Quah identification
thomaspinder Mar 8, 2026
a8390ef
feat: add Blanchard-Quah long-run identification
thomaspinder Mar 8, 2026
36f6802
test: add ForecastCondition and conditional forecast tests
thomaspinder Mar 9, 2026
aedae13
feat: add conditional forecasting on FittedVAR
thomaspinder Mar 9, 2026
c681655
feat: add conditional forecasting on IdentifiedVAR
thomaspinder Mar 9, 2026
7cfc667
fix: add input validation to LongRunRestriction.identify()
thomaspinder Mar 9, 2026
8f5572a
fix: add type annotations and exog_future guard for conditional forecast
thomaspinder Mar 9, 2026
5449ac1
fix: address final review findings
thomaspinder Mar 9, 2026
17c1674
docs: add Tier 1 documentation design
thomaspinder Mar 9, 2026
73a8a7d
docs: add Tier 1 documentation implementation plan
thomaspinder Mar 9, 2026
e285fdd
docs: add reference pages for conjugate and conditions modules
thomaspinder Mar 9, 2026
9c085fe
docs: update landing page with new features and conjugate example
thomaspinder Mar 9, 2026
7023d11
docs: add how-to guide for conjugate estimation
thomaspinder Mar 9, 2026
51137e6
docs: add how-to guide for long-run restrictions
thomaspinder Mar 9, 2026
5f49835
docs: add how-to guide for conditional forecasting
thomaspinder Mar 9, 2026
008f042
docs: extend Minnesota prior explanation with conjugacy, GLP, and dum…
thomaspinder Mar 9, 2026
a7d8dc1
docs: add long-run restrictions theory to identification explanation
thomaspinder Mar 9, 2026
787fba4
docs: add conditional forecasting theory explanation
thomaspinder Mar 9, 2026
2bf00ae
Drop plans from docs
thomaspinder Mar 9, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions docs/explanation/conditional-forecasting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Conditional Forecasting

## The problem

A standard VAR forecast projects all variables forward simultaneously, with no external constraints. But many questions in macroeconomics and finance require **conditional** projections:

- Central banks publish forecasts conditioned on assumed interest rate paths
- Financial institutions stress-test portfolios under assumed macroeconomic scenarios
- Policy analysts ask "what if" questions about specific variable trajectories

Conditional forecasting provides the mathematical framework for these exercises.

## The idea

A VAR forecast can be decomposed into two parts:

$$y_{t+h} = \underbrace{y_{t+h}^{u}}_{\text{unconditional}} + \underbrace{\sum_{s=0}^{h} \Phi_{h-s} \, P \, \varepsilon_{s}}_{\text{shock-driven deviation}}$$

where $y^{u}_{t+h}$ is the unconditional forecast (no future shocks), $\Phi_j$ are the moving-average (MA) coefficient matrices, $P$ is the impact matrix (Cholesky factor of $\Sigma$ or a structural matrix), and $\varepsilon_s$ are the future structural shocks.

The unconditional forecast is deterministic given the posterior draw. The future shocks are unknown. Conditional forecasting amounts to **choosing the shock paths** that make the forecast satisfy the desired constraints.

## The Waggoner-Zha algorithm

Waggoner & Zha (1999) showed that this can be formulated as a linear system. Stack all future shocks into a vector $\varepsilon = [\varepsilon_0, \varepsilon_1, \ldots, \varepsilon_{H-1}]$ of length $H \times n$, where $H$ is the number of forecast steps and $n$ is the number of variables.

Each constraint (e.g., "variable $i$ equals value $v$ at period $h$") translates into a linear equation:

$$R \, \varepsilon = c$$

where $R$ is constructed from the MA coefficients and the impact matrix, and $c$ contains the differences between target values and unconditional forecasts.

The minimum-norm solution $\varepsilon^* = R^\top (R R^\top)^{-1} c$ gives the **smallest set of shocks** (in a least-squares sense) that satisfies all constraints. This is computed via `numpy.linalg.lstsq`.

The conditional forecast is then:

$$y_{t+h}^{c} = y_{t+h}^{u} + \sum_{s=0}^{h} \Phi_{h-s} \, P \, \varepsilon^*_s$$

## Structural extensions

When the model is identified (you have a structural impact matrix $P$ rather than just the Cholesky factor of $\Sigma$), you can also condition on **structural shock paths**. For example, "assume the supply shock is zero for the next 4 periods" translates into direct constraints on elements of $\varepsilon$.

Observable constraints and shock constraints can be combined in a single system. Observable constraints use the full MA representation (involving $\Phi$ and $P$), while shock constraints are simpler — they directly pin individual elements of $\varepsilon$.

## Bayesian uncertainty

The algorithm is applied independently to each posterior draw of $(B, \Sigma)$ or $(B, P)$. This means:

- The unconditional forecast differs across draws (parameter uncertainty)
- The MA coefficients differ across draws
- The shock paths satisfying the constraints differ across draws

The result is a full posterior distribution of conditional forecasts, from which you can compute medians, HDIs, and other summaries. The constrained variables will hit their targets exactly in every draw, but the unconstrained variables will show genuine posterior uncertainty about the conditional projection.
22 changes: 22 additions & 0 deletions docs/explanation/identification.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,25 @@ scheme = SignRestriction(
```

Sign restrictions are weaker than Cholesky (they don't uniquely identify the model), but they require fewer assumptions.

## Long-run restrictions (Blanchard-Quah)

Cholesky and sign restrictions both constrain the **contemporaneous** (impact) response to structural shocks. Long-run restrictions take a different approach: they constrain the **cumulative** response as the horizon goes to infinity.

The key object is the **long-run multiplier matrix**:

$$C(1) = (I - A_1 - A_2 - \cdots - A_p)^{-1}$$

This matrix captures the total cumulative effect of a one-time shock. If the VAR is stationary, all shocks are transitory and $C(1)$ is finite. The long-run impact of structural shocks is $C(1) P$, where $P$ is the structural impact matrix.

Blanchard & Quah (1989) proposed forcing $C(1) P$ to be **lower triangular**. This is achieved by applying the Cholesky decomposition not to the residual covariance $\Sigma$ (as in short-run Cholesky) but to the long-run covariance:

$$C(1) \, \Sigma \, C(1)^\top = L \, L^\top$$

The structural impact matrix is then $P = C(1)^{-1} L$.

The interpretation depends on the variable ordering:
- Shocks later in the ordering have **zero long-run cumulative effect** on variables earlier in the ordering
- The first shock is unrestricted in its long-run effects

This is exactly the same Cholesky math, applied to a different matrix. The economics, however, are very different: you're restricting permanent effects rather than contemporaneous effects. In the classic Blanchard-Quah example, ordering output before prices means the second shock (interpreted as a demand shock) has no permanent effect on output — only supply shocks do.
69 changes: 68 additions & 1 deletion docs/explanation/minnesota-prior.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# The Minnesota Prior

The **Minnesota prior** (Impulso, 1986) is the most widely used prior for Bayesian VARs. It encodes the belief that each variable follows a random walk, with coefficients on other variables' lags shrunk toward zero.
The **Minnesota prior** (Litterman, 1986) is the most widely used prior for Bayesian VARs. It encodes the belief that each variable follows a random walk, with coefficients on other variables' lags shrunk toward zero.

## Key hyperparameters

Expand All @@ -27,3 +27,70 @@ spec = VAR(lags=4, prior="minnesota")
prior = MinnesotaPrior(tightness=0.2, decay="geometric", cross_shrinkage=0.3)
spec = VAR(lags=4, prior=prior)
```

## Conjugate estimation

When the Minnesota prior is paired with a Normal-Inverse-Wishart (NIW) likelihood, the posterior belongs to the same family — this is called **conjugacy**. The practical consequence is dramatic: instead of running an iterative MCMC sampler, we can compute the posterior parameters in closed form and draw from it directly.

The prior specifies:

$$B \mid \Sigma \sim \mathcal{MN}(B_0, \Sigma, V_0), \qquad \Sigma \sim \mathcal{IW}(S_0, \nu_0)$$

where $\mathcal{MN}$ is the matrix normal distribution and $\mathcal{IW}$ is the inverse Wishart. Given data matrices $Y$ (observations) and $X$ (lagged regressors), the posterior parameters are:

$$V_{\text{post}} = (V_0^{-1} + X^\top X)^{-1}$$

$$B_{\text{post}} = V_{\text{post}}(V_0^{-1} B_0 + X^\top Y)$$

$$\nu_{\text{post}} = \nu_0 + T$$

$$S_{\text{post}} = S_0 + Y^\top Y + B_0^\top V_0^{-1} B_0 - B_{\text{post}}^\top V_{\text{post}}^{-1} B_{\text{post}}$$

The posterior mean for $B$ is a precision-weighted average of the prior mean $B_0$ and the OLS estimate $(X^\top X)^{-1} X^\top Y$. When the prior is tight (small $V_0$), the posterior stays close to the random walk prior. When data is abundant (large $X^\top X$), the posterior approaches OLS. The posterior for $\Sigma$ is an inverse Wishart with updated scale and degrees of freedom.

Sampling is straightforward: draw $\Sigma \sim \mathcal{IW}(S_{\text{post}}, \nu_{\text{post}})$, then $B \mid \Sigma \sim \mathcal{MN}(B_{\text{post}}, \Sigma, V_{\text{post}})$. Each draw is independent — no burn-in, no autocorrelation, no convergence diagnostics. `ConjugateVAR` implements this approach.

## Data-driven prior selection

The Minnesota prior's hyperparameters — `tightness`, `cross_shrinkage`, and `decay` — are often chosen by convention or trial-and-error. Giannone, Lenza & Primiceri (2015) proposed an empirical Bayes approach: choose hyperparameters by maximising the **marginal likelihood**.

The marginal likelihood is the probability of the observed data $Y$ given hyperparameters $\lambda$, after integrating out all model parameters:

$$p(Y \mid \lambda) = \int p(Y \mid B, \Sigma) \, p(B, \Sigma \mid \lambda) \, dB \, d\Sigma$$

Thanks to conjugacy, this integral has a closed-form solution involving determinants and multivariate gamma functions. Optimising $\log p(Y \mid \lambda)$ over $\lambda = (\text{tightness}, \text{cross\_shrinkage})$ using standard numerical optimisation (L-BFGS-B) is fast and reliable.

The marginal likelihood naturally balances two forces:

- **Fit**: a loose prior (high tightness) lets the model fit the data closely
- **Parsimony**: a loose prior also spreads probability mass over implausible parameter values, reducing the marginal likelihood

The optimal hyperparameters sit at the sweet spot where the prior is just flexible enough to capture the data's patterns without wasting probability on noise.

This approach is sometimes called **GLP** after the authors' initials. In Impulso, use `ConjugateVAR.optimize_prior()` or the shorthand `prior="minnesota_optimized"`.

## Dummy observation priors

An elegant trick from the classical VAR literature encodes prior beliefs not by modifying the prior distribution directly, but by **appending synthetic observations** to the dataset. These "dummy observations" push the posterior in the desired direction while preserving conjugacy.

### Sum-of-coefficients prior

The sum-of-coefficients prior (Doan, Litterman & Sims, 1984) encodes the belief that if all variables have been at their initial sample values forever, they should persist at those values. Formally, it adds observations implying:

$$\sum_{j=1}^{p} A_j \approx I$$

where $A_j$ are the lag coefficient matrices. This discourages the model from predicting rapid mean-reversion when it isn't supported by the data. The hyperparameter $\mu$ controls the prior's strength: smaller $\mu$ imposes the belief more tightly.

### Single-unit-root prior

The single-unit-root prior (Sims, 1993) encodes the belief that each variable follows an independent random walk. It adds a single observation per variable that makes the model reluctant to introduce cointegrating relationships not strongly supported by the data. The hyperparameter $\delta$ controls its strength.

### Practical usage

In Impulso, dummy observations are added via `VARData.with_dummy_observations()`:

```python
data_augmented = data.with_dummy_observations(n_lags=4, mu=1.0, delta=1.0)
```

The augmented data can then be passed to any estimation method — `ConjugateVAR`, `VAR`, or used with `optimize_prior()`. Because the dummy observations are simply extra rows in the data matrices, they preserve conjugacy and integrate seamlessly with marginal likelihood optimisation.
113 changes: 113 additions & 0 deletions docs/how-to/conditional-forecasting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Conditional Forecasting

Standard forecasts let the model speak freely — given the historical data, where do the variables go next? But policy analysis often needs to answer a different question: **what happens if we assume a specific path for one variable?**

For example:
- "What happens to GDP and inflation if the central bank raises the policy rate by 25bp at each of the next 4 meetings?"
- "What is the inflation outlook if oil prices stay at \$80/barrel for the next year?"

Conditional forecasting answers these questions. It finds the **smallest set of shocks** consistent with the assumed path, then traces out the implications for all other variables. This implements the algorithm of Waggoner & Zha (1999).

## Defining conditions

A `ForecastCondition` specifies the variable, the periods (0-indexed forecast steps), and the target values:

```python
from impulso import ForecastCondition

# "The policy rate will be 5.25 at steps 0, 1, 2, and 3"
rate_path = ForecastCondition(
variable="rate",
periods=[0, 1, 2, 3],
values=[5.25, 5.50, 5.75, 6.00],
)
```

You can specify multiple conditions on different variables:

```python
# Also condition on oil prices
oil_path = ForecastCondition(
variable="oil",
periods=[0, 1, 2, 3],
values=[80.0, 80.0, 80.0, 80.0],
)
```

!!! note "Periods are 0-indexed"
Period 0 is the first forecast step (one step ahead of the last observation). Period 3 is four steps ahead.

## Reduced-form conditional forecasts

On a `FittedVAR` (before identification), conditional forecasts use the Cholesky factor of the residual covariance to define the shock space:

```python
from impulso import VAR, VARData, ForecastCondition

data = VARData.from_df(df, endog=["gdp", "inflation", "rate"])
fitted = VAR(lags=4, prior="minnesota").fit(data)

# Condition on a rising rate path
rate_path = ForecastCondition(
variable="rate",
periods=[0, 1, 2, 3],
values=[5.25, 5.50, 5.75, 6.00],
)

result = fitted.conditional_forecast(steps=8, conditions=[rate_path])
```

The result is a `ConditionalForecastResult` — a subclass of `ForecastResult` that also stores the conditions. You can inspect it the same way:

```python
# Posterior median conditional forecast
result.median()

# HDI credible intervals
result.hdi(prob=0.89)

# Plot
result.plot()
```

The constrained variable ("rate") will hit its target values exactly at the specified periods. The unconstrained variables ("gdp", "inflation") show the model's best prediction given those constraints — how the economy would evolve under the assumed rate path.

!!! tip "Interpreting the results"
The conditional forecast answers: "What is the most likely path for all variables, given that `rate` follows the specified path?" The uncertainty bands for unconstrained variables reflect both parameter uncertainty (different posterior draws give different answers) and the uncertainty about which combination of shocks would produce the assumed path.

## Structural conditional forecasts

If you have an identified model, you can also condition on **structural shock paths**. This is useful for scenario analysis: "What happens if there are no supply shocks for the next 4 quarters?"

```python
from impulso.identification import Cholesky

identified = fitted.set_identification_strategy(
Cholesky(ordering=["gdp", "inflation", "rate"])
)

# Condition on zero supply shocks
no_supply_shocks = ForecastCondition(
variable="gdp", # shock named after the first variable in ordering
periods=[0, 1, 2, 3],
values=[0.0, 0.0, 0.0, 0.0],
)

result = identified.conditional_forecast(
steps=8,
conditions=[rate_path],
shock_conditions=[no_supply_shocks],
)
```

Here, `conditions` constrain observable variable paths (as before), and `shock_conditions` constrain structural shock paths. You can use either or both.

!!! warning "Shock naming"
Shock names correspond to the variable names in the identification scheme's ordering. With `Cholesky(ordering=["gdp", "inflation", "rate"])`, the shocks are named "gdp", "inflation", and "rate".

## Degrees of freedom

Each condition uses up one degree of freedom per constrained period. The total number of constraints cannot exceed `steps * n_variables` (the total number of shock values to be determined). In practice, keeping constraints well below this limit gives more stable results — the system becomes increasingly sensitive as you approach the maximum.

!!! tip "Start simple"
Begin with conditions on one variable at a few periods. Add more constraints incrementally and check that results remain sensible. Over-constraining can produce large, implausible shocks.
Loading