This project implements a production-oriented machine learning system for customer churn prediction, with a strong focus on reliability, monitoring, and data quality.
Instead of stopping at model training, the system is designed to reflect the full machine learning lifecycle, including:
- Reusable feature engineering
- Standardized model training and evaluation
- Reference data management
- Statistical data drift detection
- Foundations for monitoring and retraining decisions
The goal of the project is to demonstrate how machine learning models behave after deployment, how data changes over time, and how such changes can be detected and acted upon in a controlled, production-ready manner.
- Python 3.12
- pip
pip install -r requirements.txt
python -m pytest tests/ -v
python models/train_and_log.py
docker build -t churn-mlops . && docker run --rm churn-mlops
uvicorn api.app:app --reload
Prefect pipeline orchestrates daily drift monitoring automatically.
python orchestration/pipeline.py
- Runs data drift checks
- Runs prediction drift checks
- Triggers alerts if drift is detected
This project includes a dedicated feature engineering layer that transforms raw customer data into model-ready features and produces a reference feature dataset representing the training-time data distribution.
A reusable preprocessing pipeline is implemented in features/build_features.py with the following transformations:
- Categorical features are encoded using One-Hot Encoding
- Numerical features are standardized using Standard Scaling
- Feature names are explicitly generated to preserve schema clarity and support monitoring
The transformed features are saved as a reference dataset:
data/reference/reference_features.parquet
This dataset serves as the baseline distribution for:
- Data drift detection
- Comparison with future production data
- Informed model retraining decisions
Separating raw data, feature construction, and reference storage enables reliable monitoring and aligns the system with production-grade MLOps practices.
The project includes standardized scripts for model training and evaluation using the engineered reference features.
- The model is trained using features from the reference dataset
- The trained model is serialized for reuse and deployment
- Performance is evaluated on a held-out test split
- Metrics include precision, recall, ROC-AUC, and a confusion matrix
- Evaluation results are persisted for traceability
The system includes a data drift detection component that compares incoming production features against a reference feature baseline.
- The reference feature dataset represents training-time data
- New production features are statistically compared to this baseline
- Drift is quantified using:
- Population Stability Index (PSI)
- Kolmogorov–Smirnov (KS) test
- Feature-wise drift metrics are generated and persisted
- These metrics provide the foundation for monitoring data quality and triggering retraining decisions
Beyond input data drift, the system monitors prediction drift by tracking changes in the model’s output probabilities over time.
- Prediction probabilities are generated on both reference and production feature sets
- Distribution statistics (mean, spread, range) are compared
- Shifts in prediction behavior are quantified independently of input drift
This enables early detection of silent model degradation, even when input features appear stable.
The system converts monitoring signals into actionable alerts using predefined business rules.
- Data drift alerts based on Population Stability Index (PSI)
- Prediction drift alerts based on shifts in model output probabilities
Alerts are designed to:
- Highlight data quality risks
- Surface silent model behavior changes
- Guide retraining and intervention decisions without overreacting to noise
The system exposes its core capabilities through a lightweight API layer.
-
POST /predict
Returns churn probability and binary prediction for a given feature payload. -
GET /alerts
Surfaces the current alert state derived from data and prediction drift monitoring.
The API decouples monitoring and inference logic from presentation, allowing downstream systems, dashboards, or schedulers to consume model outputs and operational signals reliably.