(k, P)-Anonymity for Time-Series Data

This project implements privacy-preserving techniques for time-series data, specifically focusing on the (k, P)-anonymity model proposed by Shou et al. in the paper:

"Supporting Pattern-Preserving Anonymization for Time-Series Data"
IEEE Transactions on Knowledge and Data Engineering, 2011.

Repository Overview

The src directory is organized into four modules:

Analyzer: Scripts for performing analysis, including deanonymization attacks, query utility testing, and statistical calculations.
Util: Primitive functions for time-series handling, normalization, and verification tests.
KAPRA: The implementation of the bottom-up KAPRA algorithm.
Naive: The implementation of the top-down Naive algorithm.

The primary tool for anonymizing a dataset is kp-anonymity.py, which provides a command-line interface (CLI). Usage instructions can be accessed via:

python3 kp-anonymity.py {kapra/naive} -h

Actually kapra and naive have the same instructions at the moment :)

While kp-anonymity.py serves as the main entry point for users, several auxiliary scripts facilitated the analysis:

analysis_scalability_utility.py: Used to generate data for scalability (execution time) and utility metrics, including Value Loss (VL), Pattern Loss (PL), and SAX level.
Several scripts throughout the repository include a main block to perform specific experiments and data evaluations.

Datasets

The algorithms are evaluated using two distinct data sources: a synthetic generator for establishing performance baselines and a real-world medical database to assess practical utility.

Synthetic Data

Synthetic datasets are generated using the dataset_generation notebook, primarily employing a random walk model. This function simulates highly diverse time series with significant fluctuations, mirroring scenarios such as financial volatility or health-related physiological measurements. While useful for controlled scalability tests, these datasets often lack the complex correlations found in natural signals.

Real-world Data: MIT-BIH Arrhythmia Database

To evaluate high-dimensional data with natural similarities, the MIT-BIH Arrhythmia Dataset is used.

Description: Approximately 109,000 ECG heartbeat signals.
Quasi-Identifiers (QI): Amplitude measurements at consecutive timestamps.
Sensitive Attribute (SA): Clinical classification (Normal, Supraventricular, Ventricular, Fusion, Unknown).

Analysis and Documentation

Detailed theoretical background is provided in the original paper (KP Anonymity.pdf), while the implementation analysis is documented in Analysis_23-01-2026.pdf.

The analysis report covers:

Definitions of the utility and privacy metrics.
Scalability and utility assessments using synthetic data.
Performance comparisons between the Naive and KAPRA approaches.
A case study on the MIT-BIH database covering query utility and statistical preservation.
Privacy assessments, including homogeneity risk and deanonymization analysis.

Citations

L. Shou, X. Shang, K. Chen, G. Chen and C. Zhang, "Supporting Pattern-Preserving Anonymization for Time-Series Data," in IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 4, pp. 877-892, April 2013, doi: 10.1109/TKDE.2011.249. keywords: {Couplings;Databases;Publishing;Pattern matching;Data models;Data privacy;Correlation;Privacy;anonymity;pattern;time series},

BibTeX

@ARTICLE{6095556,
  author={Shou, Lidan and Shang, Xuan and Chen, Ke and Chen, Gang and Zhang, Chao},
  journal={IEEE Transactions on Knowledge and Data Engineering}, 
  title={Supporting Pattern-Preserving Anonymization for Time-Series Data}, 
  year={2013},
  volume={25},
  number={4},
  pages={877-892},
  keywords={Couplings;Databases;Publishing;Pattern matching;Data models;Data privacy;Correlation;Privacy;anonymity;pattern;time series},
  doi={10.1109/TKDE.2011.249}}

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
Data		Data
docs		docs
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

(k, P)-Anonymity for Time-Series Data

Repository Overview

Datasets

Synthetic Data

Real-world Data: MIT-BIH Arrhythmia Database

Analysis and Documentation

Citations

BibTeX

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

(k, P)-Anonymity for Time-Series Data

Repository Overview

Datasets

Synthetic Data

Real-world Data: MIT-BIH Arrhythmia Database

Analysis and Documentation

Citations

BibTeX

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages