CausalPretraining

This repository holds the official code for: Embracing the black box: Heading towards foundation models for causal discovery from time series data

❗ As the paper is still a preprint, we did not polish this repo fully. Most of the stuff work however just fine. We are going to revisit this in the future. ❗

If you are simply interested in using one of our pretrained networks: Usage

If you are interested in the data: Here

If you are interested in reproducing the experimental results: All scripts are included and largely commented below. We provide the full code base here and most of the things are directly executable (with the exception of our synthetic experiments that we ran on a slurm cluster).

Installation:

The main environment can be installed with

conda env create -f env_droplet.yml

Additionally, for PCMCI experiments, an additional environment can be installed via:

conda env create -f env_tigramite_droplet.yml

Usage

The project is loosely build upon (https://github.com/ashleve/lightning-hydra-template)

DATA

To generate synthetic data samples run

cd data
python generate_synthetic_ds.py --scale_up --synthetic_six --joint

To prepare other data sources used in the paper, download them here:

wget -P data/ "https://github.com/anndvision/data/raw/main/jasmin/four_outputs_liqcf_pacific.csv"
wget -P data/ "https://raw.githubusercontent.com/wasimahmadpk/cdmi/refs/heads/main/datasets/river_discharge_data/data_dillingen.csv"
wget -P data/ "https://raw.githubusercontent.com/wasimahmadpk/cdmi/refs/heads/main/datasets/river_discharge_data/data_kempten.csv"
wget -P data/ "https://raw.githubusercontent.com/wasimahmadpk/cdmi/refs/heads/main/datasets/river_discharge_data/data_lenggries.csv"

For the Kuramoto data we use the generator of: https://github.com/loeweX/AmortizedCausalDiscovery

Then run:

python generate_other_ds.py --kuramoto --aerosols --kuramoto_path path/to/download --aerosols_path path/to/download

to generate the proper formatting. The river dataset needs no formatting.

Training

To simply train a default model with a small set of synthetic data samples run e.g.:

python train.py model.model_type=transformer data.ds_name=SNL model=medium.yaml

Pretrained weights (Best Runs from the Joint Experiments and the size "big") are included in /pretrained_weights and can be used directly, e.g. as in Make Graph or Usage (Be careful, you need correlation injection inputs)

Reproduce

All baselines included in the paper can be recreated by running e.g. :

python calc_baselines.py --corr --synth --var

You can find the summary display here Here. The results of the grid searches are provided in Here. Further, zero-shot results as well as inference speeds for Causally Pretrained Neural Networks can be reproduced by running:

python calc_cp_performance.py --rivers --aerosols --speed

Finally, the slurm scripts used for the grid search are included in slurm. This can be used to calculate a distribution_over_outputs (Appendix, Paper)

Feel free to contact me if you would like to have any additional content/information/code. 😎

Maintainers

@GideonStein.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
config		config
data		data
experimental_results		experimental_results
helpers		helpers
model		model
pretrained_weights		pretrained_weights
slurm		slurm
README.md		README.md
calc_baselines.py		calc_baselines.py
calc_cp_performance.py		calc_cp_performance.py
calc_dist_preds.py		calc_dist_preds.py
env_droplet.yml		env_droplet.yml
env_tigramite_droplet.yml		env_tigramite_droplet.yml
make_graphs.ipynb		make_graphs.ipynb
pretained_model_example_usage.ipynb		pretained_model_example_usage.ipynb
summarize_baseline_scorings.ipynb		summarize_baseline_scorings.ipynb
summarize_cp_scorings.ipynb		summarize_cp_scorings.ipynb
train.py		train.py
utils.py		utils.py
visualization.png		visualization.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CausalPretraining

Installation:

Usage

DATA

Training

Reproduce

Maintainers

About

Uh oh!

Releases

Packages

Languages

Gideon-Stein/CausalPretraining

Folders and files

Latest commit

History

Repository files navigation

CausalPretraining

Installation:

Usage

DATA

Training

Reproduce

Maintainers

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages