Anomaly Detection in Smart City IoT Data

This repository implements a modular machine learning pipeline for detecting, ranking, and interpreting anomalies in IoT sensor streams, specifically tailored for smart city infrastructure. The pipeline is designed to integrate multi-stage filtering, interpretable supervised learning, and SHAP-based feature attribution for explainable results.

Objective

The aim is to surface actionable anomalies from heterogeneous, irregularly-sampled time series originating from weather and infrastructure sensors. Our system supports scalable ingest of real-world IoT datasets (e.g., EcoNET), identifies significant deviations from learned baselines, and enables interpretation through model-based explanations.

Pipeline Architecture

1. Data Ingestion & Preprocessing

Converts long-form EcoNET sensor logs to wide-form matrices with per-sensor columns
Resamples to uniform 5-minute intervals
Imputes missing values using KNN with spatial-aware neighbors
Applies z-score normalization using rolling 1-day window

2. Feature Engineering

Rolling temporal statistics (mean, std)
Spectral power features from FFT
Change point detection via Augmented Dickey-Fuller (ADF) test
Temporal context: hour-of-day, weekday/weekend, etc.

3. Unsupervised Outlier Filtering

Uses DBSCAN with dynamic ( \varepsilon ) and log-scaled min_samples
Filters extreme sparse outliers before supervised learning
Adaptively chooses threshold from distance distributions

4. Supervised Anomaly Classifiers

Random Forest: interpretable tree-based voting model with class weights
SVM (RBF kernel): margin-based classifier with scaling and balancing
Bidirectional LSTM: sequential model for long-range temporal dependencies

5. Ensemble Decision

Weighted soft-voting ensemble using calibrated outputs
Emphasizes models with sharper confidence under high class imbalance

6. Interpretability Layer

Applies SHAP (TreeExplainer) to Random Forest classifier
Outputs both global and local feature importance via bar/dot plots
Stores model artifacts + SHAP values in structured output folders

Evaluation Protocol

Binary classification metrics:
- F1 Score (macro)
- ROC-AUC (if available)
- Precision at 90% recall
- Overall Accuracy
Output summary saved to terminal and logs

Usage Instructions

1. Setup Environment

pip install -r requirements.txt

2. Convert EcoNET Data (One-time)

python3 -c "from utils.econet_converter import convert_econet_long_to_wide; convert_econet_long_to_wide('data/raw/train.csv', 'data/raw/smart_city_iot.csv')"

3. Run Full Pipeline

python main.py

4. Results Location

SHAP: outputs/shap_summary_*.png
Models: models/*.joblib
Metrics: printed to stdout

Notes

Developed as part of CSC 522 (NCSU) — Spring 2025
Dataset source: EcoNET (NC State Climate Office)
Wide-form input must include timestamp, sensor_id, and anomaly columns
To accelerate DBSCAN: limit rows in main.py (e.g., .iloc[:5000])

Contributors

Kagwe Muchane
Jonah Gloss
Dian Rajic

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data/raw		data/raw
notebooks		notebooks
src		src
utils		utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
tests.py		tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Anomaly Detection in Smart City IoT Data

Objective

Pipeline Architecture

1. Data Ingestion & Preprocessing

2. Feature Engineering

3. Unsupervised Outlier Filtering

4. Supervised Anomaly Classifiers

5. Ensemble Decision

6. Interpretability Layer

Evaluation Protocol

Usage Instructions

1. Setup Environment

2. Convert EcoNET Data (One-time)

3. Run Full Pipeline

4. Results Location

Notes

Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Anomaly Detection in Smart City IoT Data

Objective

Pipeline Architecture

1. Data Ingestion & Preprocessing

2. Feature Engineering

3. Unsupervised Outlier Filtering

4. Supervised Anomaly Classifiers

5. Ensemble Decision

6. Interpretability Layer

Evaluation Protocol

Usage Instructions

1. Setup Environment

2. Convert EcoNET Data (One-time)

3. Run Full Pipeline

4. Results Location

Notes

Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages