CODA: Temporal Domain Generalization via Concept Drift Simulator

1. About this work

Abstract

In real-world applications, machine learning models are often notoriously blamed for performance degradation due to data distribution shifts. Temporal domain generalization aims to learn models that can adapt to "concept drift" over time and perform well in the near future. To the best of our knowledge, existing works rely on model extrapolation enhancement or models with dynamic parameters to achieve temporal generalization. However, these model-centric training strategies involve the unnecessarily comprehensive interaction between data and model to train the model for distribution shift, accordingly. To this end, we aim to tackle the concept drift problem from a data-centric perspective and naturally bypass the cumbersome interaction between data and model. Developing the data-centric framework involves two challenges: (i) existing generative models struggle to generate future data with natural evolution, and (ii) directly capturing the temporal trends of data with high precision is daunting. To tackle these challenges, we propose the COncept Drift simulAtor (CODA) framework incorporating a predicted feature correlation matrix to simulate future data for model training. Specifically, the feature correlations matrix serves as a delegation to represent data characteristics at each time point and the trigger for future data generation. Experimental results demonstrate that using CODA-generated data as training input effectively achieves temporal domain generalization across different model architectures with great transferability.

CODA Framework

An overview of the proposed framework CODA, consisting of a Correlation Predictor (see Section 3.2) and a Data Simulator (see Section 3.3). In the first stage, the Correlation Predictor is trained on feature correlation matrices at each time point $\mathcal{C}{i}$. In the second stage, the predicted correlation matrix $\mathcal{\hat{C}}{T+1}$ served as prior knowledge for Data Simulator $\boldsymbol{G}$ to be updated by Equation 5.

2 Developed Environment and Dependency

Please follow the settings in requirement.txt

3. Dataset and Trained Models Preparation

Elec2 and Moons datasets can be found in data folder.
The CODA-generated datasets can be also found in data folder.

3. Experiment Reproducibility

To reproduce the experimental results of the CODA's Correlation Predictor component on the Elec2 dataset, please follow the instruction below:

$ cd ./classification/
$ bash run_train_correlation_predictor.sh

To generate data via CODA for the Elec2 dataset, please follow the instruction below:

$ cd ./classification/
$ bash run_gen_coda_data.sh

To test the model trained on the CODA-generated data on the Elec2 dataset, please follow the instruction below:

$ cd ./classification/
$ bash run_test_coda_data.sh

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
classification		classification
framework_figure		framework_figure
rebuttal_figures		rebuttal_figures
LICENSE		LICENSE
model_architecture.PNG		model_architecture.PNG
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CODA: Temporal Domain Generalization via Concept Drift Simulator

1. About this work

Abstract

CODA Framework

2 Developed Environment and Dependency

3. Dataset and Trained Models Preparation

3. Experiment Reproducibility

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CODA: Temporal Domain Generalization via Concept Drift Simulator

1. About this work

Abstract

CODA Framework

2 Developed Environment and Dependency

3. Dataset and Trained Models Preparation

3. Experiment Reproducibility

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages