Skip to content

[FEATURE] Add BART prior sampling convenience functions and vignette (RFC 0003) #330

@andrewherren

Description

@andrewherren

Summary

Add sampleBARTPrior() (R) and sample_bart_prior() (Python) as convenience wrappers for sampling from the BART prior, along with a vignette demonstrating their use.

Depends on

#329 — the observation weights argument must be available first.

Motivation

Sampling from the BART prior is a standard Bayesian workflow step useful for:

  1. Prior calibration — understanding what functions BART places non-negligible probability on before seeing data, to guide hyperparameter choice (num_trees, alpha, beta, leaf scale)
  2. Prior predictive checks — generating data from p(y | X, prior) to verify the prior isn't ruling out plausible outcomes
  3. Pedagogy — showing new users what BART "believes" before data is observed

Approach

Running BART with all-zero observation weights produces exact prior draws:

  • Split evaluation reduces to the tree structure prior ratio (data have no effect)
  • Leaf parameters are drawn from their prior N(0, σ²_μ)
  • σ² samples from its IG prior

The convenience functions wrap this pattern so users don't need to understand the mechanics:

# R
sampleBARTPrior(X, num_samples = 100, mean_forest_params = list(num_trees = 50))
# Python
sample_bart_prior(X, num_samples=100, mean_forest_params={"num_trees": 50})

Scope

  • sampleBARTPrior() in R/bart.R, exported from NAMESPACE
  • sample_bart_prior() in stochtree/bart.py, exported from __init__.py
  • Both hard-code num_gfr = 0 (GFR initialization is data-dependent and ill-defined with zero weights); emit a warning if the user passes num_gfr > 0 with zero weights manually
  • Vignette demonstrating prior draws, prior predictive distributions, and a simple calibration example
  • Unit test verifying that marginal variance of prior draws matches num_trees * leaf_scale² for a flat leaf prior
  • sample_sigma2_global should default to TRUE/True in the convenience wrapper so users can easily construct the full prior predictive y ~ N(f(x), σ²)

Future work

  • After completing and validating this prior sampler, add the same functionality for BCF models

Reference

RFC 0003: https://github.com/StochasticTree/rfcs/blob/main/0003-bart-prior-sampling.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions