Multi-Constraint Time Series Imputation

Novel framework for handling missing values in time series data through sequential algorithm application based on dataset constraints

Published in IEEE Conference 2024 | Read the Paper

🎯 Overview

Traditional imputation methods apply single algorithms without considering dataset-specific characteristics. This work introduces a multi-constraint sequential imputation framework that identifies optimal algorithm sequences based on:

Missing value percentage - How much data is missing
Dataset length - Temporal span of the time series
Number of variables - Univariate vs. multivariate complexity

By applying algorithms iteratively in the right order, we achieve significant accuracy improvements over individual algorithm approaches.

📊 Key Results

Dataset	Individual Algorithm RMSE	Sequential Framework RMSE	Improvement
BAFU (water discharge)	0.371	0.148	60% reduction
Climate	0.717	0.275	62% reduction
Drift10 (gas sensors)	0.785	0.275	65% reduction

Example Sequential Pipeline:
SPIRIT (miss_perc) → CDRec (ts_length) → SoftImpute (ts_nbr) = Better accuracy than any single algorithm

🔬 Methodology

1. Constraint Identification

Analyze dataset characteristics:

Percentage of missing blocks
Time series length (1K - 50K observations)
Number of columns (univariate vs. multivariate)

2. Algorithm Selection

Evaluate 12 state-of-the-art algorithms:

CDRec - Centroid decomposition
DynaMMo - Latent variable evolution
SVT - Singular value thresholding
GROUSE - Grassmannian gradient projection
SoftImpute - Soft-thresholded SVD
ROSL - Robust orthonormal subspace learning
STMVL - Spatio-temporal multi-view learning
SPIRIT - Streaming pattern identification
TeNMF - Temporal non-negative matrix factorization
TRMF - Temporal regularized matrix factorization
TKCM - Time series k-nearest centroids matching
SVDImpute - SVD-based imputation

3. Sequential Application

Apply algorithms iteratively, refining imputation at each step based on constraint-specific performance.

🚀 Getting Started

This work builds upon the excellent ImputeBench framework by eXascale Infolab.

Prerequisites

# Clone the benchmark repository
git clone https://github.com/eXascaleInfolab/bench-vldb20
cd bench-vldb20

# Install dependencies
pip install -r requirements.txt

Running the Framework

Prepare your dataset in the required format
Analyze constraints using the dataset profiler
Select algorithm sequence based on constraint analysis
Run iterative imputation with the chosen sequence

See the benchmark documentation for detailed usage instructions.

📚 Citation

If you use this work in your research, please cite:

@inproceedings{gondi2024multiconstraint,
  title={Multi-Constraint Time Series Imputation},
  author={Gondi, Nisarga and Kayarvizhy N.},
  booktitle={2024 IEEE International Conference on Computer Science and Engineering (UBMK)},
  year={2024},
  doi={10.1109/...11115894}
}

📖 Publication

Multi-Constraint Time Series Imputation
Nisarga Gondi, Kayarvizhy N.
IEEE Conference 2024

Abstract: Imputation of missing values in real-time univariate and multivariate time series is critical for maintaining data quality and model accuracy. While many algorithms exist, their performance is typically evaluated under single-constraint scenarios. This work addresses real-world datasets that exhibit multiple simultaneous constraints by introducing a sequential framework that dynamically selects and orders imputation algorithms based on dataset characteristics, achieving substantial accuracy improvements over traditional single-algorithm approaches.

🛠️ Technical Details

Datasets Evaluated:

Air Quality (China monitoring stations)
Water Discharge (BAFU - Swiss rivers)
Climate Data (NOAA stations)
Gas Sensors (UCI repository)
Household Energy (France)
Temperature (China meteorological)
Weather (MeteoSwiss)

Evaluation Metrics:

Root Mean Squared Error (RMSE)
Mean Absolute Error (MAE)
Mean Squared Error (MSE)

Framework Implementation: Built using Python with the ImputeBench benchmarking framework as the foundation.

🙏 Acknowledgments

This work builds upon the ImputeBench benchmark suite developed by the eXascale Infolab research group. We thank them for providing the comprehensive framework that enabled this research.

📧 Contact

Nisarga Gondi
Carnegie Mellon University
ngondi@andrew.cmu.edu

Research Advisor: Dr. Kayarvizhy N.
B.M.S. College of Engineering

🔗 Links

Building reliable ML systems through robust data quality frameworks

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Constraint Time Series Imputation

🎯 Overview

📊 Key Results

🔬 Methodology

1. Constraint Identification

2. Algorithm Selection

3. Sequential Application

🚀 Getting Started

Prerequisites

Running the Framework

📚 Citation

📖 Publication

🛠️ Technical Details

🙏 Acknowledgments

📧 Contact

🔗 Links

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Multi-Constraint Time Series Imputation

🎯 Overview

📊 Key Results

🔬 Methodology

1. Constraint Identification

2. Algorithm Selection

3. Sequential Application

🚀 Getting Started

Prerequisites

Running the Framework

📚 Citation

📖 Publication

🛠️ Technical Details

🙏 Acknowledgments

📧 Contact

🔗 Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages