"A new operator running a high-speed setup on a difficult material batch without completing the pre-run checklist. These factors interact in non-linear ways: velocity alone does not cause scrap, but that combination almost always does."
Scrap in a stamping press line doesn't announce itself. By the time it's counted in the bin at end of shift, the material is already lost — along with the machine time, tooling wear, and downstream scheduling impact that came with it. The standard response is reactive: log the defect, investigate the cause, issue a corrective action. Repeat next week.
This project reframes the question: instead of investigating scrap after it happens, can a model score the risk of a bad run before the press starts? The answer is yes — because the conditions that produce scrap (operator experience, checklist completion, supplier lot quality, press speed) are all known at setup time. They exist in the data. They just haven't been connected to an actionable risk score.
A Decision Tree doesn't find hidden patterns in this problem. It formalizes what experienced engineers already know — and makes those rules auditable, transferable, and documentable.
- 2,000 stamping press production records from a manufacturing environment
- Target:
scrap_risk— three classes: Low / Medium / High - Class distribution: Low 19.6% · Medium 50.1% · High 30.3%
- Source: Simulated operational data reflecting real stamping process factor interactions
| Layer | Feature | Description |
|---|---|---|
| Machine | press_speed_spm |
Press speed in strokes per minute |
| Machine | raw_material_hardness_hrb |
Material hardness in HRB scale |
| Operator | operator_experience_yrs |
Years of operator experience |
| Operator | shift |
Day / Night / Early_Morning |
| Material | critical_supplier_lot |
Flag: 1 = lot from critical supplier |
| Environment | ambient_temp_c |
Shop floor temperature at run start |
| Process | recent_model_change |
Flag: 1 = model change in last 48h |
| Process | setup_checklist_complete |
Flag: 1 = pre-run checklist completed |
Key EDA findings:
- Critical supplier lots: 54.0% High Risk vs 19.9% for standard lots — the largest single structural gap
- Incomplete checklist: 42.5% High Risk vs 17.8% when complete — a process control lever, not a luck factor
- Night shift: 38.7% High Risk vs 23.2% Day — partially explained by operator experience distribution
Algorithm: Decision Tree (Gini, max_depth=5) — sklearn.tree.DecisionTreeClassifier
Decision Trees are the right model here for a reason that goes beyond performance: the output is a set of if-then-else rules that can be printed and posted at the press. The model doesn't just classify — it generates process documentation. A LinearSVC coefficient communicates direction; a Decision Tree rule communicates the exact threshold and the path to the decision.
This is a multiclass problem (three risk levels), so macro-averaged F1 is the primary metric — it penalizes poor performance on any class equally, regardless of frequency.
Why max_depth=5, min_samples_leaf=50: Deliberately constrained. Each leaf must represent at least 50 production runs — not a single outlier. The tree is slightly less accurate than an unconstrained version, and significantly more generalizable. In manufacturing, that trade is always worth it.
Preprocessing: OneHotEncoder on shift (three levels), passthrough on everything else. No scaling — trees split on thresholds, not distances.
| Metric | Value |
|---|---|
| Test Accuracy | 66.3% |
| Train Accuracy | 72.1% (small gap — well controlled) |
| F1 Macro | 63.8% |
| F1 Weighted | 65.7% |
| CV Accuracy (5-fold) | 68.8% ± 1.5% |
Per-class performance:
| Class | Precision | Recall | F1 |
|---|---|---|---|
| High | 0.64 | 0.70 | 0.67 |
| Low | 0.89 | 0.41 | 0.56 |
| Medium | 0.64 | 0.74 | 0.69 |
Honest note: Low class recall is 41% — the tree frequently confuses Low with Medium. Operationally, this is the less critical error: sending a Low-risk run through Medium-risk protocols wastes some caution, but doesn't allow scrap to happen undetected. The model prioritizes High-risk detection, which it handles well (70% recall).
Confusion matrix (600 test runs):
| Pred: High | Pred: Low | Pred: Medium | |
|---|---|---|---|
| Actual: High | 128 ✅ | 0 | 54 |
| Actual: Low | 0 | 48 ✅ | 70 |
| Actual: Medium | 72 | 6 | 222 ✅ |
| Feature | Importance | What it means |
|---|---|---|
operator_experience_yrs |
30.4% | Strongest driver — experience compensates for difficult conditions |
setup_checklist_complete |
21.9% | Process control lever — the most actionable single intervention |
critical_supplier_lot |
20.1% | Material quality — triggers a mandatory change in run conditions |
press_speed_spm |
19.6% | Speed interacts with experience — not dangerous alone |
recent_model_change |
7.9% | Setup instability signal |
shift, hardness, temp |
0.0% | Zero Gini importance in this tree configuration |
The top four features account for 92% of the model's decision power. The tree's top split is operator experience at 2 years — a threshold that any HR or production planning system already tracks.
Process_Decisions_Optimization/
├── 05_DT_Process_Decisions_Optimization.ipynb # Notebook (no outputs)
├── scrap_risk_data.csv # Sample dataset (250 rows)
├── README.md
└── requirements.txt
📦 Full Project Pack — complete dataset (2,000 rows), notebook with full outputs including tree visualization and text rules, presentation deck (PPTX + PDF), and
app.pypre-run risk simulator available on Gumroad.
Option 1 — Google Colab: Click the badge above.
Option 2 — Local:
pip install -r requirements.txt
jupyter notebook 05_DT_Process_Decisions_Optimization.ipynb- The model output is the SOP —
export_text()produces if-then-else rules that can be transcribed directly into process control documents. No translation needed between model and practice. - Multiclass F1 Macro is not optional — with three classes and class imbalance, accuracy is misleading. A model that ignores Low entirely can still score 80% accuracy. Macro F1 prevents that illusion.
- Controlling complexity is a design decision — max_depth=5 and min_samples_leaf=50 are not limitations imposed by the data. They're choices made to produce a model that generalizes to next week's production, not just last week's.
- Zero-importance features tell the story too — shift, hardness, and ambient temperature carry no Gini importance. This doesn't mean they're irrelevant to scrap — it means their effect is already captured by the features that matter (experience, checklist, supplier lot).
- The interaction structure matters more than individual variables — press speed at 55 spm with an experienced operator is manageable. At 55 spm with a 6-month operator and an incomplete checklist, it isn't. Trees capture this logic without feature engineering.
Luis Lozano | Operational Excellence Manager · Master Black Belt · Machine Learning
GitHub: LozanoLsa · Gumroad: lozanolsa.gumroad.com
Turning Operations into Predictive Systems — Clone it. Fork it. Improve it.