A production-style machine learning system for predicting industrial equipment failures using the NASA Turbofan Jet Engine dataset.
This project demonstrates an end-to-end ML pipeline: from time-series feature engineering and imbalanced learning to decision optimization and deployment via API and interactive dashboard.
The model detects ~98% of failures before they occur (high recall) with ROC-AUC โ 0.95, demonstrating strong ability to rank high-risk engines ahead of failure.
In predictive maintenance, missing a failure is far more costly than triggering a false alarm; therefore the system is optimized for failure detection rather than raw accuracy.
Industrial predictive maintenance is a real-world ML problem where:
- Data is highly imbalanced
- Accuracy can be misleading
- Decision thresholds matter more than model choice
- Cost of false negatives >> false positives
This project focuses on engineering a reliable decision system, not just training a classifier.
predictive-maintenance-engine/
โโโ api/ # REST API for model inference
โ โโโ app.py # FastAPI application
โโโ app/ # Streamlit web application
โ โโโ app.py # Main app entry point
โ โโโ home.py # Home page
โ โโโ predictions.py # Interactive predictions
โ โโโ performance.py # Model performance dashboard
โ โโโ about.py # Project documentation
โ โโโ utils.py # App utility functions
โโโ src/ # Source code modules
โ โโโ __init__.py
โ โโโ config.py # Configuration management
โ โโโ data_loader.py # Data loading and preprocessing
โ โโโ feature_engineering.py # Feature creation
โ โโโ models.py # ML model implementations
โ โโโ reinforcement_learning.py # RL-based scheduler
โ โโโ evaluation.py # Model evaluation and visualization
โ โโโ train.py # Training pipeline
โ โโโ predict.py # Prediction pipeline
โ โโโ utils.py # Utility functions
โโโ data/ # Data directory
โ โโโ CMaps/ # Raw NASA dataset
โ โโโ processed/ # Processed datasets
โโโ models/ # Trained models
โโโ notebooks/ # Jupyter notebooks for exploration
โโโ reports/ # Generated reports and results
โโโ assets/ # Generated visualizations
โโโ logs/ # Application logs
โโโ tests/ # Unit tests
โโโ requirements.txt # Python dependencies
โโโ setup.py # Package installation
โโโ .gitignore # Git ignore rules
โโโ README.md # This file
NASA Turbofan Jet Engine Dataset (C-MAPSS)
- Source: NASA PCoE Datasets
- Description: Run-to-failure simulation data from turbofan engines
- Features: 21 sensor measurements + 3 operational settings
- Target: Remaining Useful Life (RUL) โ converted to binary failure classification using a failure horizon threshold
- Splits: FD001, FD002, FD003, FD004 (different operating conditions)
- Python 3.9+
- pip or conda package manager
- Clone the repository
git clone https://github.com/atinyshrimp/predictive-maintenance-engine.git
cd predictive-maintenance-engine- Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies
pip install -r requirements.txt- Install package in development mode
pip install -e .Train the model with cost-sensitive learning:
python src/train.py --dataset FD001 --imbalance cost_sensitiveTraining Options:
python src/train.py \
--dataset FD001 \ # Dataset: FD001, FD002, FD003, FD004
--imbalance cost_sensitive \ # none, smote, undersample, cost_sensitive
--no-save # Don't save trained models (optional)Generate predictions and maintenance schedules:
python src/predict.py \
--model "models/random_forest_(balanced).pkl" \
--dataset FD001 \
--output data/predictions.csvLaunch the interactive Streamlit dashboard:
streamlit run app/app.pyThe web app will open at http://localhost:8501 with:
- ๐ Home: Project overview and system status
- ๐ฎ Predictions: Interactive failure prediction with real-time sensor input
- ๐ Performance: Model metrics, confusion matrix, ROC curves
- โน๏ธ About: Detailed project documentation
Live Demo: View on Streamlit Cloud (after deployment)
Start the FastAPI server:
cd api
python app.pyAPI will be available at:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
Example API Request:
import requests
response = requests.post(
"http://localhost:8000/predict",
json={
"unit_id": 1,
"sensor_values": [0.5, 0.3, -0.2, 0.8, ...] # 21+ sensor values
}
)
print(response.json())
# {
# "unit_id": 1,
# "failure_probability": 0.75,
# "failure_prediction": true,
# "risk_level": "HIGH",
# "recommendation": "Schedule immediate maintenance..."
# }- Interactive Dashboard: Real-time predictions with intuitive UI
- Unit Selection: Choose from 100 test engines
- Timeline Analysis: See failure probability evolution over engine lifecycle
- Risk Assessment: Color-coded alerts (Low/Medium/High/Critical)
- Gauge Charts: Visual failure probability indicators
- Performance Metrics: Live model performance tracking
- Responsive Design: Mobile-friendly interface
- Automated data loading and preprocessing
- RUL (Remaining Useful Life) computation
- Feature scaling with MinMaxScaler
- Low-variance feature removal
- Class imbalance handling (SMOTE, RUS, cost-sensitive)
- Rolling Statistics: Mean, standard deviation, and EMA computed for all sensors with window sizes [3, 5]
- Degradation Features:
- Cycle position normalization (0-1 scale)
- Rate of change for key sensors (deterioration velocity)
- Cumulative sum (total degradation accumulation)
- 120+ engineered features from 20 base sensors
- Time-series aware feature generation for predictive patterns
- Random Forest: Ensemble method with balanced class weights
- Decision-threshold optimization to maximize recall under class imbalance
- Comprehensive hyperparameter configurations
- Pipeline-based training for reproducibility
- Accuracy, Precision, Recall, F1-Score
- ROC-AUC and ROC curves
- Precision-Recall curves
- Confusion matrices
- Feature importance analysis
- Cost-benefit analysis
Reinforcement learning is used to optimize maintenance decisions, not to replace the predictive model.
The Q-learning agent uses predicted failure probability and engine health state to learn when maintenance should be performed, balancing:
- Failure risk
- Maintenance cost
- Downtime penalties
This demonstrates how ML predictions can be integrated into a decision-making system rather than used in isolation.
From an operational perspective, high recall significantly reduces catastrophic failures, which are typically far more expensive than preventive inspections triggered by false positives.
In real industrial settings, preventing a single catastrophic engine failure can outweigh the cost of dozens of preventive inspections, making recall the dominant optimization objective.
| Model | Accuracy | Precision | Recall | F1-Score | ROC-AUC |
|---|---|---|---|---|---|
| Random Forest (Balanced) | 74.9% | 43.8% | 98.1% | 60.6% | 0.945 |
Key Achievement: ~98% recall means catching virtually all failures before they occur.
Note: Low precision is expected and acceptable for maintenance systems where false negatives (missed failures) are far more costly than false positives (unnecessary inspections).
- Recall optimization crucial: Achieved 98% recall through cost-sensitive learning (balanced class weights) and degradation features
- Feature engineering impact: Rolling std, EMA, and degradation patterns (cycle position, rate of change) improved ROC-AUC from 0.47 to 0.95
- Precision-recall trade-off: Acceptable to have ~43% precision when recall is 98%+ in safety-critical maintenance
- Hyperparameter tuning: Deeper trees (depth 30), more estimators (500), and balanced class weights enabled better minority class detection
- Random Forest selected: Best precision-recall balance for safety-critical predictive maintenance
- Confusion Matrix: Shows 98.1% of failures correctly identified (high recall)
- ROC Curve: 0.945 AUC indicates excellent model discrimination
- Precision-Recall: Trade-off optimized for safety (prefer false alarms over missed failures)
- Feature Importance: Degradation features (cycle_norm, rate_of_change) are top predictors
While results are strong, several factors make the task easier than a real industrial deployment:
- Failure defined at 90 cycles (earlier warning makes detection easier)
- Evaluation performed primarily on FD001 dataset (single operating condition)
- Precision remains moderate (~40-45%), meaning false positives still occur
- Model performance may vary across engines and operating regimes
These limitations reflect realistic trade-offs in predictive maintenance, where maximizing failure detection is typically prioritized over minimizing false alarms.
During early experiments, models achieved >93% accuracy yet detected zero failures; a classic failure mode in imbalanced classification.
The issue was the default probability threshold (0.5), which prevented the model from predicting the rare failure class.
By analyzing score distributions and optimizing the decision threshold for recall instead of accuracy:
- Failure detection improved from 0% โ ~98% recall
- ROC-AUC remained high, confirming real predictive signal
- This demonstrated that evaluation strategy and thresholding matter more than model choice in rare-event detection
This mirrors real predictive maintenance systems, where decision thresholds are tuned according to risk and cost rather than generic metrics.
While the current system achieves 98-99% recall (catching virtually all failures), here are potential enhancements for production deployment:
- Current Challenge: 40-47% precision means ~60% false positive rate
- Approach: Multi-threshold strategy with different alert levels
- Expected Impact: Reduce false alarms by 20-30% while maintaining 95%+ recall
- Implementation:
- LOW risk threshold: 0.3 (high precision, catches severe cases)
- MEDIUM risk threshold: 0.4-0.5 (balanced)
- HIGH risk threshold: optimized for recall (current approach)
- Current:
failure_threshold = 100cycles creates 49% failure rate (easier problem) - Production: Reduce to 90 cycles for more challenging, realistic prediction
- Trade-off: Higher difficulty but more actionable predictions (imminent failures only)
- Expected Impact: Precision improves to 55-65%, recall drops to 85-90%
- Soft Voting: Combine multiple Random Forest models with weighted averaging
- Stacking: Use meta-learner (Logistic Regression) on top of base models
- Expected Impact: +1-2% ROC-AUC, +2-3% F1-score
- Implementation:
VotingClassifierwithvoting='soft'and optimized weights
- Current: Single train/val/test split may have variance
- Improvement: 5-fold time-series cross-validation
- Benefit: More reliable performance estimates, detect overfitting
- Tool:
TimeSeriesSplitfrom scikit-learn
Note on Data Leakage Prevention: The training pipeline uses unit-level splitting (assigning entire turbofan units to either train or validation) before computing rolling/EMA features. This prevents time-series information leakage that would occur if rolling windows were computed before splitting, ensuring the validation set provides an honest estimate of model performance.
- Polynomial Features: Interaction terms between correlated sensors
- Lag Features: Previous cycle values (t-1, t-2, t-3)
- Sensor Correlations: Cross-sensor relationships
- Domain Features: Temperature gradients, pressure ratios
- Expected Impact: +2-4% ROC-AUC for complex patterns
- SHAP Values: Explain individual predictions for maintenance teams
- LIME: Local explanations for high-risk predictions
- Feature Contribution: Show which sensors triggered the alert
- Benefit: Trust and adoption by maintenance personnel
- Current: Manual tuning based on domain knowledge
- Approach: Bayesian optimization with
OptunaorHyperopt - Search Space: 50-100 combinations
- Expected Impact: +1-3% F1-score, better generalization
- Current: Optimized for FD001 (single operating condition)
- Extension: Train on FD001-FD004 combined
- Challenge: Different operating conditions (altitude, mach number)
- Benefit: Generalized model for diverse environments
- Stream Processing: Apache Kafka + Spark Streaming
- Incremental Updates: Online learning for concept drift
- Alerting: Integration with maintenance management systems
- Dashboard: Real-time monitoring with Grafana/Tableau
- Quantify: Maintenance cost vs. failure cost
- Optimize: Threshold selection based on business metrics
- ROI: Calculate expected savings from predictive maintenance
- Reporting: Executive dashboard with financial impact
High Priority (Production-Ready):
- Multi-threshold alerting system (precision improvement)
- Model interpretability with SHAP (trust & adoption)
- Cross-validation (robustness validation)
Medium Priority (Enhanced Performance):
Failure threshold tuning to 50 cycles(decreased to 90 as of Feb 2026)- Ensemble methods (stacking/soft voting)
- Extended feature engineering
Long-Term (Scalability):
- Multi-dataset training (FD001-FD004)
- Real-time streaming pipeline
- Automated hyperparameter optimization
- Cost-benefit optimization framework
Run unit tests:
pytest tests/Run with coverage:
pytest tests/ --cov=src --cov-report=htmlAll modules include comprehensive docstrings following Google style. Generate HTML docs:
pdoc --html src -o docs/Modify src/config.py to adjust:
- Model hyperparameters
- Feature engineering settings
- RL configuration
- File paths
This project uses:
- Black for code formatting
- Flake8 for linting
- isort for import sorting
Format code:
black src/ api/ tests/
isort src/ api/ tests/
flake8 src/ api/ tests/- Create feature branch:
git checkout -b feature/your-feature - Make changes and commit:
git commit -m "Description" - Push branch:
git push origin feature/your-feature - Create Pull Request
Build and run with Docker:
docker build -t predictive-maintenance .
docker run -p 8000:8000 predictive-maintenanceNot Implemented Yet
Deploy the Streamlit app for free:
- Push code to GitHub
- Go to share.streamlit.io
- Connect your repository
- Set Main file path:
app/app.py - Deploy! Auto-updates on every push to main
- AWS: Deploy with EC2 + ECS or Lambda
- GCP: Use Cloud Run or App Engine
- Azure: Deploy with App Service or Container Instances
Not Implemented Yet
- NASA Turbofan Engine Dataset
- A. Saxena, K. Goebel, D. Simon, and N. Eklund, โDamage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation.โ https://data.nasa.gov/Aerospace/CMAPSS-Jet-Engine-Simulated-Data/ff5v-kuh6/about_data, oct 2008. Accessed: 2024- 12-31
- Imbalanced Learning: imbalanced-learn documentation
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Submit a pull request
This project is licensed under the MIT License - see LICENSE file for details.
.png)

.png)
.png)