A machine learning system that automates credit card approval decisions using historical application data.
Credit card approvals typically take days to process. There's inconsistency between approvers and manual review is expensive and error-prone.
This system evaluates applicants instantly and consistently while explaining its reasoning. It integrates historical patterns to predict default risk and approves qualified applicants immediately.
- Data Collection - Historical credit applications with outcomes
- Preprocessing - Clean missing values, standardize features for ML
- Feature Engineering - Compute meaningful metrics (debt-to-income ratio)
- Model Training - Test Logistic Regression, Random Forest, XGBoost (67/33 split)
- Evaluation - Compare accuracy, precision, recall across models
- Deployment - Web interface for real-time predictions
- Accuracy: 70% on test set
- Recall: 100% (identifies all defaults)
- Speed: < 100ms per decision
- Availability: 24/7 uptime
pip install -r requirements.txtpython six_step_algorithm.pyThis executes all 6 steps and shows model performance.
streamlit run website.pyIn another terminal:
streamlit run analytics_dashboard.py --server.port 8502| File | Purpose |
|---|---|
preprocessing.py |
Data cleaning, scaling, feature engineering |
train_model.py |
Initial model training script |
six_step_algorithm.py |
Complete workflow - start here |
model_comparison.py |
Compare 3 algorithms with cross-validation |
hyperparameter_tuning.py |
GridSearch/RandomSearch optimization |
website.py |
Streamlit prediction interface |
analytics_dashboard.py |
Business metrics and compliance dashboard |
data/credit.csv |
Training dataset (30 applications) |
The system creates a Debt-to-Income (DTI) Ratio feature:
DTI = Total Debt / Annual Income
This single metric captures financial health better than raw debt or income alone. Mean DTI in training data: 0.22, Max: 0.70.
Tested three algorithms on the same data:
| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| Logistic Regression | 70% | 40% | 100% | 57% |
| Random Forest | 60% | 33% | 100% | 50% |
| XGBoost | 60% | 33% | 100% | 50% |
Best Model: Logistic Regression (highest accuracy, interpretable)
The system uses 30 simulated credit applications with:
- Age (18-100)
- Annual Income ($20K-$200K)
- Credit Score (300-850)
- Credit Utilization (0-100%)
- Payment History Score (0-1)
- Total Outstanding Debt ($2K-$28K)
- Target: Approval (1) or Rejection (0)
Split: 67% training, 33% testing (stratified by outcome)
- Fair lending: Model evaluated for disparate impact
- Explainable: Every decision shows contributing factors
- Auditable: All predictions logged with input features
- GDPR ready: No sensitive personal data stored
- Increase training data (currently 30 samples)
- Add more features (employment history, credit age, etc.)
- Implement model monitoring for performance degradation
- Set up automated retraining pipeline
- A/B test against manual approval process
- Deploy to AWS/GCP with API endpoints
Check the documentation files:
six_step_algorithm_guide.md- Detailed technical explanationcrisp_dm_methodology.md- ML methodology and best practices