This repository implements a modular machine learning pipeline for detecting, ranking, and interpreting anomalies in IoT sensor streams, specifically tailored for smart city infrastructure. The pipeline is designed to integrate multi-stage filtering, interpretable supervised learning, and SHAP-based feature attribution for explainable results.
The aim is to surface actionable anomalies from heterogeneous, irregularly-sampled time series originating from weather and infrastructure sensors. Our system supports scalable ingest of real-world IoT datasets (e.g., EcoNET), identifies significant deviations from learned baselines, and enables interpretation through model-based explanations.
- Converts long-form EcoNET sensor logs to wide-form matrices with per-sensor columns
- Resamples to uniform 5-minute intervals
- Imputes missing values using KNN with spatial-aware neighbors
- Applies z-score normalization using rolling 1-day window
- Rolling temporal statistics (mean, std)
- Spectral power features from FFT
- Change point detection via Augmented Dickey-Fuller (ADF) test
- Temporal context: hour-of-day, weekday/weekend, etc.
- Uses DBSCAN with dynamic ( \varepsilon ) and log-scaled
min_samples - Filters extreme sparse outliers before supervised learning
- Adaptively chooses threshold from distance distributions
- Random Forest: interpretable tree-based voting model with class weights
- SVM (RBF kernel): margin-based classifier with scaling and balancing
- Bidirectional LSTM: sequential model for long-range temporal dependencies
- Weighted soft-voting ensemble using calibrated outputs
- Emphasizes models with sharper confidence under high class imbalance
- Applies SHAP (TreeExplainer) to Random Forest classifier
- Outputs both global and local feature importance via bar/dot plots
- Stores model artifacts + SHAP values in structured output folders
- Binary classification metrics:
- F1 Score (macro)
- ROC-AUC (if available)
- Precision at 90% recall
- Overall Accuracy
- Output summary saved to terminal and logs
pip install -r requirements.txtpython3 -c "from utils.econet_converter import convert_econet_long_to_wide; convert_econet_long_to_wide('data/raw/train.csv', 'data/raw/smart_city_iot.csv')"python main.py- SHAP:
outputs/shap_summary_*.png - Models:
models/*.joblib - Metrics: printed to stdout
- Developed as part of CSC 522 (NCSU) — Spring 2025
- Dataset source: EcoNET (NC State Climate Office)
- Wide-form input must include
timestamp,sensor_id, andanomalycolumns - To accelerate DBSCAN: limit rows in
main.py(e.g.,.iloc[:5000])
- Kagwe Muchane
- Jonah Gloss
- Dian Rajic