This repository investigates the application of Regularized Logistic Regression for predicting customer churn. Moving beyond standard heuristic models, this project focuses on the mathematical rigor of convex loss surfaces, comparing the efficiency and convergence of different numerical solvers (L-BFGS vs. SAGA) to achieve a "Glass-Box" model that balances predictive power with high interpretability.
The project is architected using a Unified Pipeline approach, ensuring that data preprocessing, feature engineering, and inference are encapsulated in a single, atomic artifact for production reliability.
The core objective is the minimization of the L2-regularized log-loss function:
This implementation specifically evaluates the performance of:
- L-BFGS (Limited-memory Broyden–Fletcher–Goldfarb–Shanno): A quasi-Newton method that approximates the Hessian to achieve second-order convergence.
- SAGA: A variant of the Stochastic Average Gradient descent that supports non-smooth functional components (L1 penalty) and offers a linear convergence rate.
The system is built on three pillars of Machine Learning Engineering:
- Pipeline Encapsulation: All transformations (Scaling, One-Hot Encoding) are strictly handled within a
ColumnTransformerto prevent Data Leakage. - Production Safety: Categorical encoders are configured with
handle_unknown='ignore'to ensure API stability during novel real-world inputs.
- Powered by FastAPI, providing an asynchronous RESTful endpoint for low-latency churn scoring.
- Utilizes a single-artifact deployment model (
churn_pipeline.joblib) to ensure Training-Serving Alignment.
- An interactive Streamlit dashboard designed for stakeholders to simulate customer scenarios and visualize risk probabilities in real-time.
├── api.py # FastAPI Inference Service
├── dashboard.py # Streamlit Business Dashboard
├── 01_Model_Training.ipynb # Research & Solver Benchmarking
├── requirements.txt # Production Dependencies
pip install -r requirements.txt
Run the core research notebook to generate convergence plots and export the model:
Open 01_Model_Training.ipynb in Jupyter
Start the API: uvicorn api:app --reload
Start the Dashboard: streamlit run dashboard.py
Author: Arij Belmabrouk
Focus: Numerical Optimization | Machine Learning Engineering | Systems Architecture