R/`cmldiffvar`

Causal Machine Learning Methods for Differential Variance Inference

Authors: Philippe Boileau, Hani Zaki, Mireille Schnizter

What’s `cmldiffvar`?

cmldiffvar implements causal machine learning methods for differential variance inference. These methods rely on semiparametric efficiency theory and flexible machine learning methods — namely, Super Learner ensembles — to avoid the need for convenience assumptions about data-generating processes (van der Laan and Rose 2011; van der Laan, Polley, and Hubbard 2007). Hypothesis tests about differential variance can uncover heterogeneous treatment effects, even when the effect modifiers are excluded from the data. Details on the methodology are provided in Boileau et al. (In preparation).

Installation

The development version of the package may be installed from GitHub using remotes:

remotes::install_github("PhilBoileau/cmldiffvar")

Example

We estimate the absolute differential variance, defined as the difference of the potential outcomes’ standard deviations, on a random sample of the toy_population_tbl data included with the cmldiffvar package. This dataset represents an observational study in which the treatment variable is binary, the outcome is continuous, and a single confounder was measured. The true absolute differential variance in this population is $2$. Because the absolute differential variance is non-zero, the treatment effect is heterogeneous.

We use a targeted maximum likelihood estimator (van der Laan and Rubin 2006; van der Laan and Rose 2011, 2018), the cmldiffvar() function’s default estimator, to infer the differential variance of this population. The function outputs a point estimate and a $95%$ confidence interval by default. A p-value corresponding to a test of whether the differential variance is significantly different from zero is also provided. This is equivalent to testing whether the treatment effect is homogeneous.

# load the required packages
library(cmldiffvar)
library(dplyr)
library(SuperLearner)

# set the seed for reproducibility
set.seed(510)

# random sample from population data
sample_tbl <- slice_sample(toy_population_tbl, n = 250)

# estimate absolute differential variance
dif_var_result_tbl <- sample_tbl |>
  cmldiffvar(
    propensity_score_adj_var_names = "confounder",
    cond_exp_outcome_adj_var_names = "confounder",
    treatment_var_name = "treatment",
    outcome_var_name = "outcome"
  )

estimand	estimate	se	ci_low	ci_high	p_value
absolute differential variance	2.11	0.29	1.53	2.69	0.00

The absolute differential variance point estimate is near the ground truth. Additionally, the test correctly rejects the null hypothesis of a homogeneous treatment effect at the $5%$ significance level.

Issues

If you encounter any bugs or have any specific feature requests, please file an issue.

Contributions

Contributions are very welcome. Interested contributors should consult our contribution guidelines prior to submitting a pull request.

Citation

Please cite the following paper when using the cmldiffvar R software package.

@unpublished{boileau2025,
 author = {Philippe A Boileau and Hani Zaki and Gabriele Lileikyte and Niklas
           Nielsen and Patrick R Lawler and Mireille E Schnitzer},
 title = {Assumption-Lean Differential Variance Inference for Heterogeneous
          Treatment Effect Detection},
 year = {In preparation}
}

Licence

The contents of this repository are distributed under the MIT license. See file LICENSE.md for details.

References

Boileau, Philippe A, Hani Zaki, Gabriele Lileikyte, Niklas Nielsen, Patrick R Lawler, and Mireille E Schnitzer. In preparation. “Assumption-Lean Differential Variance Inference for Heterogeneous Treatment Effect Detection.”

van der Laan, Mark J., Eric C. Polley, and Alan E. Hubbard. 2007. “Super Learner.” Statistical Applications in Genetics and Molecular Biology 6 (1). https://doi.org/10.2202/1544-6115.1309.

van der Laan, Mark J., and Sherri Rose. 2011. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Series in Statistics. New York, NY: Springer. https://doi.org/10.1007/978-1-4419-9782-1.

———. 2018. Targeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies. Springer Series in Statistics. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-65304-4.

van der Laan, Mark J., and Daniel Rubin. 2006. “Targeted Maximum Likelihood Learning.” The International Journal of Biostatistics 2 (1). https://doi.org/10.2202/1557-4679.1043.

Name		Name	Last commit message	Last commit date
Latest commit History 161 Commits
.github		.github
R		R
data-raw		data-raw
data		data
inst		inst
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
cmldiffvar.Rproj		cmldiffvar.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

R/`cmldiffvar`

What’s `cmldiffvar`?

Installation

Example

Issues

Contributions

Citation

Licence

References

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

R/cmldiffvar

What’s cmldiffvar?

Installation

Example

Issues

Contributions

Citation

Licence

References

About

Resources

License

Licenses found

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

R/`cmldiffvar`

What’s `cmldiffvar`?

Packages