Skip to content

AndyVale/KP-Anonymity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

(k, P)-Anonymity for Time-Series Data

This project implements privacy-preserving techniques for time-series data, specifically focusing on the (k, P)-anonymity model proposed by Shou et al. in the paper:

"Supporting Pattern-Preserving Anonymization for Time-Series Data"
IEEE Transactions on Knowledge and Data Engineering, 2011.

Repository Overview

The src directory is organized into four modules:

  1. Analyzer: Scripts for performing analysis, including deanonymization attacks, query utility testing, and statistical calculations.
  2. Util: Primitive functions for time-series handling, normalization, and verification tests.
  3. KAPRA: The implementation of the bottom-up KAPRA algorithm.
  4. Naive: The implementation of the top-down Naive algorithm.

The primary tool for anonymizing a dataset is kp-anonymity.py, which provides a command-line interface (CLI). Usage instructions can be accessed via:

python3 kp-anonymity.py {kapra/naive} -h

Actually kapra and naive have the same instructions at the moment :)

While kp-anonymity.py serves as the main entry point for users, several auxiliary scripts facilitated the analysis:

  • analysis_scalability_utility.py: Used to generate data for scalability (execution time) and utility metrics, including Value Loss (VL), Pattern Loss (PL), and SAX level.
  • Several scripts throughout the repository include a main block to perform specific experiments and data evaluations.

Datasets

The algorithms are evaluated using two distinct data sources: a synthetic generator for establishing performance baselines and a real-world medical database to assess practical utility.

Synthetic Data

Synthetic datasets are generated using the dataset_generation notebook, primarily employing a random walk model. This function simulates highly diverse time series with significant fluctuations, mirroring scenarios such as financial volatility or health-related physiological measurements. While useful for controlled scalability tests, these datasets often lack the complex correlations found in natural signals.

Real-world Data: MIT-BIH Arrhythmia Database

To evaluate high-dimensional data with natural similarities, the MIT-BIH Arrhythmia Dataset is used.

  • Description: Approximately 109,000 ECG heartbeat signals.
  • Quasi-Identifiers (QI): Amplitude measurements at consecutive timestamps.
  • Sensitive Attribute (SA): Clinical classification (Normal, Supraventricular, Ventricular, Fusion, Unknown).

Analysis and Documentation

Detailed theoretical background is provided in the original paper (KP Anonymity.pdf), while the implementation analysis is documented in Analysis_23-01-2026.pdf.

The analysis report covers:

  • Definitions of the utility and privacy metrics.
  • Scalability and utility assessments using synthetic data.
  • Performance comparisons between the Naive and KAPRA approaches.
  • A case study on the MIT-BIH database covering query utility and statistical preservation.
  • Privacy assessments, including homogeneity risk and deanonymization analysis.

Citations

L. Shou, X. Shang, K. Chen, G. Chen and C. Zhang, "Supporting Pattern-Preserving Anonymization for Time-Series Data," in IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 4, pp. 877-892, April 2013, doi: 10.1109/TKDE.2011.249. keywords: {Couplings;Databases;Publishing;Pattern matching;Data models;Data privacy;Correlation;Privacy;anonymity;pattern;time series},

BibTeX

@ARTICLE{6095556,
  author={Shou, Lidan and Shang, Xuan and Chen, Ke and Chen, Gang and Zhang, Chao},
  journal={IEEE Transactions on Knowledge and Data Engineering}, 
  title={Supporting Pattern-Preserving Anonymization for Time-Series Data}, 
  year={2013},
  volume={25},
  number={4},
  pages={877-892},
  keywords={Couplings;Databases;Publishing;Pattern matching;Data models;Data privacy;Correlation;Privacy;anonymity;pattern;time series},
  doi={10.1109/TKDE.2011.249}}

About

Time series anonymization using the approach from the paper "Supporting Pattern-Preserving Anonymization For Time-Series Data"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors