Novel framework for handling missing values in time series data through sequential algorithm application based on dataset constraints
Published in IEEE Conference 2024 | Read the Paper
Traditional imputation methods apply single algorithms without considering dataset-specific characteristics. This work introduces a multi-constraint sequential imputation framework that identifies optimal algorithm sequences based on:
- Missing value percentage - How much data is missing
- Dataset length - Temporal span of the time series
- Number of variables - Univariate vs. multivariate complexity
By applying algorithms iteratively in the right order, we achieve significant accuracy improvements over individual algorithm approaches.
| Dataset | Individual Algorithm RMSE | Sequential Framework RMSE | Improvement |
|---|---|---|---|
| BAFU (water discharge) | 0.371 | 0.148 | 60% reduction |
| Climate | 0.717 | 0.275 | 62% reduction |
| Drift10 (gas sensors) | 0.785 | 0.275 | 65% reduction |
Example Sequential Pipeline:
SPIRIT (miss_perc) → CDRec (ts_length) → SoftImpute (ts_nbr) = Better accuracy than any single algorithm
Analyze dataset characteristics:
- Percentage of missing blocks
- Time series length (1K - 50K observations)
- Number of columns (univariate vs. multivariate)
Evaluate 12 state-of-the-art algorithms:
- CDRec - Centroid decomposition
- DynaMMo - Latent variable evolution
- SVT - Singular value thresholding
- GROUSE - Grassmannian gradient projection
- SoftImpute - Soft-thresholded SVD
- ROSL - Robust orthonormal subspace learning
- STMVL - Spatio-temporal multi-view learning
- SPIRIT - Streaming pattern identification
- TeNMF - Temporal non-negative matrix factorization
- TRMF - Temporal regularized matrix factorization
- TKCM - Time series k-nearest centroids matching
- SVDImpute - SVD-based imputation
Apply algorithms iteratively, refining imputation at each step based on constraint-specific performance.
This work builds upon the excellent ImputeBench framework by eXascale Infolab.
# Clone the benchmark repository
git clone https://github.com/eXascaleInfolab/bench-vldb20
cd bench-vldb20
# Install dependencies
pip install -r requirements.txt- Prepare your dataset in the required format
- Analyze constraints using the dataset profiler
- Select algorithm sequence based on constraint analysis
- Run iterative imputation with the chosen sequence
See the benchmark documentation for detailed usage instructions.
If you use this work in your research, please cite:
@inproceedings{gondi2024multiconstraint,
title={Multi-Constraint Time Series Imputation},
author={Gondi, Nisarga and Kayarvizhy N.},
booktitle={2024 IEEE International Conference on Computer Science and Engineering (UBMK)},
year={2024},
doi={10.1109/...11115894}
}Multi-Constraint Time Series Imputation
Nisarga Gondi, Kayarvizhy N.
IEEE Conference 2024
Abstract: Imputation of missing values in real-time univariate and multivariate time series is critical for maintaining data quality and model accuracy. While many algorithms exist, their performance is typically evaluated under single-constraint scenarios. This work addresses real-world datasets that exhibit multiple simultaneous constraints by introducing a sequential framework that dynamically selects and orders imputation algorithms based on dataset characteristics, achieving substantial accuracy improvements over traditional single-algorithm approaches.
Datasets Evaluated:
- Air Quality (China monitoring stations)
- Water Discharge (BAFU - Swiss rivers)
- Climate Data (NOAA stations)
- Gas Sensors (UCI repository)
- Household Energy (France)
- Temperature (China meteorological)
- Weather (MeteoSwiss)
Evaluation Metrics:
- Root Mean Squared Error (RMSE)
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
Framework Implementation: Built using Python with the ImputeBench benchmarking framework as the foundation.
This work builds upon the ImputeBench benchmark suite developed by the eXascale Infolab research group. We thank them for providing the comprehensive framework that enabled this research.
Nisarga Gondi
Carnegie Mellon University
ngondi@andrew.cmu.edu
Research Advisor: Dr. Kayarvizhy N.
B.M.S. College of Engineering