Skip to content

atinyshrimp/predictive-maintenance-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

24 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿญ Predictive Maintenance Engine

Python License Code style: black

A production-style machine learning system for predicting industrial equipment failures using the NASA Turbofan Jet Engine dataset.

This project demonstrates an end-to-end ML pipeline: from time-series feature engineering and imbalanced learning to decision optimization and deployment via API and interactive dashboard.

๐ŸŽฏ Key Result

The model detects ~98% of failures before they occur (high recall) with ROC-AUC โ‰ˆ 0.95, demonstrating strong ability to rank high-risk engines ahead of failure.

In predictive maintenance, missing a failure is far more costly than triggering a false alarm; therefore the system is optimized for failure detection rather than raw accuracy.

๐Ÿ“Œ Why This Project Matters

Industrial predictive maintenance is a real-world ML problem where:

  • Data is highly imbalanced
  • Accuracy can be misleading
  • Decision thresholds matter more than model choice
  • Cost of false negatives >> false positives

This project focuses on engineering a reliable decision system, not just training a classifier.

๐Ÿ—๏ธ Architecture

predictive-maintenance-engine/
โ”œโ”€โ”€ api/                          # REST API for model inference
โ”‚   โ””โ”€โ”€ app.py                    # FastAPI application
โ”œโ”€โ”€ app/                          # Streamlit web application
โ”‚   โ”œโ”€โ”€ app.py                    # Main app entry point
โ”‚   โ”œโ”€โ”€ home.py                   # Home page
โ”‚   โ”œโ”€โ”€ predictions.py            # Interactive predictions
โ”‚   โ”œโ”€โ”€ performance.py            # Model performance dashboard
โ”‚   โ”œโ”€โ”€ about.py                  # Project documentation
โ”‚   โ””โ”€โ”€ utils.py                  # App utility functions
โ”œโ”€โ”€ src/                          # Source code modules
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ config.py                 # Configuration management
โ”‚   โ”œโ”€โ”€ data_loader.py            # Data loading and preprocessing
โ”‚   โ”œโ”€โ”€ feature_engineering.py    # Feature creation
โ”‚   โ”œโ”€โ”€ models.py                 # ML model implementations
โ”‚   โ”œโ”€โ”€ reinforcement_learning.py # RL-based scheduler
โ”‚   โ”œโ”€โ”€ evaluation.py             # Model evaluation and visualization
โ”‚   โ”œโ”€โ”€ train.py                  # Training pipeline
โ”‚   โ”œโ”€โ”€ predict.py                # Prediction pipeline
โ”‚   โ””โ”€โ”€ utils.py                  # Utility functions
โ”œโ”€โ”€ data/                         # Data directory
โ”‚   โ”œโ”€โ”€ CMaps/                    # Raw NASA dataset
โ”‚   โ””โ”€โ”€ processed/                # Processed datasets
โ”œโ”€โ”€ models/                       # Trained models
โ”œโ”€โ”€ notebooks/                    # Jupyter notebooks for exploration
โ”œโ”€โ”€ reports/                      # Generated reports and results
โ”œโ”€โ”€ assets/                       # Generated visualizations
โ”œโ”€โ”€ logs/                         # Application logs
โ”œโ”€โ”€ tests/                        # Unit tests
โ”œโ”€โ”€ requirements.txt              # Python dependencies
โ”œโ”€โ”€ setup.py                      # Package installation
โ”œโ”€โ”€ .gitignore                    # Git ignore rules
โ””โ”€โ”€ README.md                     # This file

๐Ÿ“Š Dataset

NASA Turbofan Jet Engine Dataset (C-MAPSS)

  • Source: NASA PCoE Datasets
  • Description: Run-to-failure simulation data from turbofan engines
  • Features: 21 sensor measurements + 3 operational settings
  • Target: Remaining Useful Life (RUL) โ†’ converted to binary failure classification using a failure horizon threshold
  • Splits: FD001, FD002, FD003, FD004 (different operating conditions)

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.9+
  • pip or conda package manager

Installation

  1. Clone the repository
git clone https://github.com/atinyshrimp/predictive-maintenance-engine.git
cd predictive-maintenance-engine
  1. Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies
pip install -r requirements.txt
  1. Install package in development mode
pip install -e .

Training Models

Train the model with cost-sensitive learning:

python src/train.py --dataset FD001 --imbalance cost_sensitive

Training Options:

python src/train.py \
    --dataset FD001 \                    # Dataset: FD001, FD002, FD003, FD004
    --imbalance cost_sensitive \         # none, smote, undersample, cost_sensitive
    --no-save                            # Don't save trained models (optional)

Making Predictions

Generate predictions and maintenance schedules:

python src/predict.py \
    --model "models/random_forest_(balanced).pkl" \
    --dataset FD001 \
    --output data/predictions.csv

Running the Web App

Launch the interactive Streamlit dashboard:

streamlit run app/app.py

The web app will open at http://localhost:8501 with:

  • ๐Ÿ  Home: Project overview and system status
  • ๐Ÿ”ฎ Predictions: Interactive failure prediction with real-time sensor input
  • ๐Ÿ“Š Performance: Model metrics, confusion matrix, ROC curves
  • โ„น๏ธ About: Detailed project documentation

Live Demo: View on Streamlit Cloud (after deployment)

Running the API

Start the FastAPI server:

cd api
python app.py

API will be available at:

Example API Request:

import requests

response = requests.post(
    "http://localhost:8000/predict",
    json={
        "unit_id": 1,
        "sensor_values": [0.5, 0.3, -0.2, 0.8, ...]  # 21+ sensor values
    }
)

print(response.json())
# {
#   "unit_id": 1,
#   "failure_probability": 0.75,
#   "failure_prediction": true,
#   "risk_level": "HIGH",
#   "recommendation": "Schedule immediate maintenance..."
# }

๐Ÿ“ˆ Features

Web Application (Streamlit)

  • Interactive Dashboard: Real-time predictions with intuitive UI
  • Unit Selection: Choose from 100 test engines
  • Timeline Analysis: See failure probability evolution over engine lifecycle
  • Risk Assessment: Color-coded alerts (Low/Medium/High/Critical)
  • Gauge Charts: Visual failure probability indicators
  • Performance Metrics: Live model performance tracking
  • Responsive Design: Mobile-friendly interface

Data Processing

  • Automated data loading and preprocessing
  • RUL (Remaining Useful Life) computation
  • Feature scaling with MinMaxScaler
  • Low-variance feature removal
  • Class imbalance handling (SMOTE, RUS, cost-sensitive)

Feature Engineering

  • Rolling Statistics: Mean, standard deviation, and EMA computed for all sensors with window sizes [3, 5]
  • Degradation Features:
    • Cycle position normalization (0-1 scale)
    • Rate of change for key sensors (deterioration velocity)
    • Cumulative sum (total degradation accumulation)
  • 120+ engineered features from 20 base sensors
  • Time-series aware feature generation for predictive patterns

Model

  • Random Forest: Ensemble method with balanced class weights
  • Decision-threshold optimization to maximize recall under class imbalance
  • Comprehensive hyperparameter configurations
  • Pipeline-based training for reproducibility

Evaluation Metrics

  • Accuracy, Precision, Recall, F1-Score
  • ROC-AUC and ROC curves
  • Precision-Recall curves
  • Confusion matrices
  • Feature importance analysis
  • Cost-benefit analysis

Reinforcement Learning

Reinforcement learning is used to optimize maintenance decisions, not to replace the predictive model.

The Q-learning agent uses predicted failure probability and engine health state to learn when maintenance should be performed, balancing:

  • Failure risk
  • Maintenance cost
  • Downtime penalties

This demonstrates how ML predictions can be integrated into a decision-making system rather than used in isolation.

๐Ÿ“Š Results

From an operational perspective, high recall significantly reduces catastrophic failures, which are typically far more expensive than preventive inspections triggered by false positives.

In real industrial settings, preventing a single catastrophic engine failure can outweigh the cost of dozens of preventive inspections, making recall the dominant optimization objective.

Model Performance (FD001 Dataset)

Model Accuracy Precision Recall F1-Score ROC-AUC
Random Forest (Balanced) 74.9% 43.8% 98.1% 60.6% 0.945

Key Achievement: ~98% recall means catching virtually all failures before they occur.

Note: Low precision is expected and acceptable for maintenance systems where false negatives (missed failures) are far more costly than false positives (unnecessary inspections).

Key Insights

  • Recall optimization crucial: Achieved 98% recall through cost-sensitive learning (balanced class weights) and degradation features
  • Feature engineering impact: Rolling std, EMA, and degradation patterns (cycle position, rate of change) improved ROC-AUC from 0.47 to 0.95
  • Precision-recall trade-off: Acceptable to have ~43% precision when recall is 98%+ in safety-critical maintenance
  • Hyperparameter tuning: Deeper trees (depth 30), more estimators (500), and balanced class weights enabled better minority class detection
  • Random Forest selected: Best precision-recall balance for safety-critical predictive maintenance

๐Ÿ“Š Project Showcase

Performance Visualizations

Confusion Matrix

Confusion Matrix Random Forest: 98.1% recall with balanced confusion matrix

ROC Curve

ROC Curve ROC-AUC 0.945 demonstrates excellent discrimination

Precision-Recall Curve

Precision-Recall Optimized for high recall in safety-critical maintenance

Feature Importance

Feature Importance Top features: degradation patterns and rolling statistics

Key Takeaways from Visualizations

  • Confusion Matrix: Shows 98.1% of failures correctly identified (high recall)
  • ROC Curve: 0.945 AUC indicates excellent model discrimination
  • Precision-Recall: Trade-off optimized for safety (prefer false alarms over missed failures)
  • Feature Importance: Degradation features (cycle_norm, rate_of_change) are top predictors

โš ๏ธ Experimental Limitations

While results are strong, several factors make the task easier than a real industrial deployment:

  • Failure defined at 90 cycles (earlier warning makes detection easier)
  • Evaluation performed primarily on FD001 dataset (single operating condition)
  • Precision remains moderate (~40-45%), meaning false positives still occur
  • Model performance may vary across engines and operating regimes

These limitations reflect realistic trade-offs in predictive maintenance, where maximizing failure detection is typically prioritized over minimizing false alarms.

๐Ÿง  Key ML Lesson

During early experiments, models achieved >93% accuracy yet detected zero failures; a classic failure mode in imbalanced classification.

The issue was the default probability threshold (0.5), which prevented the model from predicting the rare failure class.

By analyzing score distributions and optimizing the decision threshold for recall instead of accuracy:

  • Failure detection improved from 0% โ†’ ~98% recall
  • ROC-AUC remained high, confirming real predictive signal
  • This demonstrated that evaluation strategy and thresholding matter more than model choice in rare-event detection

This mirrors real predictive maintenance systems, where decision thresholds are tuned according to risk and cost rather than generic metrics.

๐Ÿ”ฎ Potential Improvements

While the current system achieves 98-99% recall (catching virtually all failures), here are potential enhancements for production deployment:

1. Precision Optimization

  • Current Challenge: 40-47% precision means ~60% false positive rate
  • Approach: Multi-threshold strategy with different alert levels
  • Expected Impact: Reduce false alarms by 20-30% while maintaining 95%+ recall
  • Implementation:
    • LOW risk threshold: 0.3 (high precision, catches severe cases)
    • MEDIUM risk threshold: 0.4-0.5 (balanced)
    • HIGH risk threshold: optimized for recall (current approach)

2. Failure Threshold Tuning

  • Current: failure_threshold = 100 cycles creates 49% failure rate (easier problem)
  • Production: Reduce to 90 cycles for more challenging, realistic prediction
  • Trade-off: Higher difficulty but more actionable predictions (imminent failures only)
  • Expected Impact: Precision improves to 55-65%, recall drops to 85-90%

3. Advanced Ensemble Methods

  • Soft Voting: Combine multiple Random Forest models with weighted averaging
  • Stacking: Use meta-learner (Logistic Regression) on top of base models
  • Expected Impact: +1-2% ROC-AUC, +2-3% F1-score
  • Implementation: VotingClassifier with voting='soft' and optimized weights

4. Cross-Validation for Robustness

  • Current: Single train/val/test split may have variance
  • Improvement: 5-fold time-series cross-validation
  • Benefit: More reliable performance estimates, detect overfitting
  • Tool: TimeSeriesSplit from scikit-learn

Note on Data Leakage Prevention: The training pipeline uses unit-level splitting (assigning entire turbofan units to either train or validation) before computing rolling/EMA features. This prevents time-series information leakage that would occur if rolling windows were computed before splitting, ensuring the validation set provides an honest estimate of model performance.

5. Extended Feature Engineering

  • Polynomial Features: Interaction terms between correlated sensors
  • Lag Features: Previous cycle values (t-1, t-2, t-3)
  • Sensor Correlations: Cross-sensor relationships
  • Domain Features: Temperature gradients, pressure ratios
  • Expected Impact: +2-4% ROC-AUC for complex patterns

6. Model Interpretability

  • SHAP Values: Explain individual predictions for maintenance teams
  • LIME: Local explanations for high-risk predictions
  • Feature Contribution: Show which sensors triggered the alert
  • Benefit: Trust and adoption by maintenance personnel

7. Hyperparameter Optimization

  • Current: Manual tuning based on domain knowledge
  • Approach: Bayesian optimization with Optuna or Hyperopt
  • Search Space: 50-100 combinations
  • Expected Impact: +1-3% F1-score, better generalization

8. Multi-Dataset Generalization

  • Current: Optimized for FD001 (single operating condition)
  • Extension: Train on FD001-FD004 combined
  • Challenge: Different operating conditions (altitude, mach number)
  • Benefit: Generalized model for diverse environments

9. Real-Time Monitoring Pipeline

  • Stream Processing: Apache Kafka + Spark Streaming
  • Incremental Updates: Online learning for concept drift
  • Alerting: Integration with maintenance management systems
  • Dashboard: Real-time monitoring with Grafana/Tableau

10. Cost-Benefit Analysis

  • Quantify: Maintenance cost vs. failure cost
  • Optimize: Threshold selection based on business metrics
  • ROI: Calculate expected savings from predictive maintenance
  • Reporting: Executive dashboard with financial impact

Priority Roadmap

High Priority (Production-Ready):

  1. Multi-threshold alerting system (precision improvement)
  2. Model interpretability with SHAP (trust & adoption)
  3. Cross-validation (robustness validation)

Medium Priority (Enhanced Performance):

  1. Failure threshold tuning to 50 cycles (decreased to 90 as of Feb 2026)
  2. Ensemble methods (stacking/soft voting)
  3. Extended feature engineering

Long-Term (Scalability):

  1. Multi-dataset training (FD001-FD004)
  2. Real-time streaming pipeline
  3. Automated hyperparameter optimization
  4. Cost-benefit optimization framework

๐Ÿงช Testing

Run unit tests:

pytest tests/

Run with coverage:

pytest tests/ --cov=src --cov-report=html

๐Ÿ“ Documentation

Code Documentation

All modules include comprehensive docstrings following Google style. Generate HTML docs:

pdoc --html src -o docs/

Configuration

Modify src/config.py to adjust:

  • Model hyperparameters
  • Feature engineering settings
  • RL configuration
  • File paths

๐Ÿ› ๏ธ Development

Code Style

This project uses:

  • Black for code formatting
  • Flake8 for linting
  • isort for import sorting

Format code:

black src/ api/ tests/
isort src/ api/ tests/
flake8 src/ api/ tests/

Git Workflow

  1. Create feature branch: git checkout -b feature/your-feature
  2. Make changes and commit: git commit -m "Description"
  3. Push branch: git push origin feature/your-feature
  4. Create Pull Request

๐Ÿš€ Deployment

Docker (Recommended)

Build and run with Docker:

docker build -t predictive-maintenance .
docker run -p 8000:8000 predictive-maintenance

Not Implemented Yet

Streamlit Cloud (Web App)

Deploy the Streamlit app for free:

  1. Push code to GitHub
  2. Go to share.streamlit.io
  3. Connect your repository
  4. Set Main file path: app/app.py
  5. Deploy! Auto-updates on every push to main

Cloud Deployment (API)

  • AWS: Deploy with EC2 + ECS or Lambda
  • GCP: Use Cloud Run or App Engine
  • Azure: Deploy with App Service or Container Instances

Not Implemented Yet

๐Ÿ“š References

๐Ÿค Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with tests
  4. Submit a pull request

๐Ÿ“„ License

This project is licensed under the MIT License - see LICENSE file for details.