Process Decisions Optimization — Decision Tree

"A new operator running a high-speed setup on a difficult material batch without completing the pre-run checklist. These factors interact in non-linear ways: velocity alone does not cause scrap, but that combination almost always does."

🎯 Business Problem

Scrap in a stamping press line doesn't announce itself. By the time it's counted in the bin at end of shift, the material is already lost — along with the machine time, tooling wear, and downstream scheduling impact that came with it. The standard response is reactive: log the defect, investigate the cause, issue a corrective action. Repeat next week.

This project reframes the question: instead of investigating scrap after it happens, can a model score the risk of a bad run before the press starts? The answer is yes — because the conditions that produce scrap (operator experience, checklist completion, supplier lot quality, press speed) are all known at setup time. They exist in the data. They just haven't been connected to an actionable risk score.

A Decision Tree doesn't find hidden patterns in this problem. It formalizes what experienced engineers already know — and makes those rules auditable, transferable, and documentable.

📊 Dataset

2,000 stamping press production records from a manufacturing environment
Target: scrap_risk — three classes: Low / Medium / High
Class distribution: Low 19.6% · Medium 50.1% · High 30.3%
Source: Simulated operational data reflecting real stamping process factor interactions

Layer	Feature	Description
Machine	`press_speed_spm`	Press speed in strokes per minute
Machine	`raw_material_hardness_hrb`	Material hardness in HRB scale
Operator	`operator_experience_yrs`	Years of operator experience
Operator	`shift`	Day / Night / Early_Morning
Material	`critical_supplier_lot`	Flag: 1 = lot from critical supplier
Environment	`ambient_temp_c`	Shop floor temperature at run start
Process	`recent_model_change`	Flag: 1 = model change in last 48h
Process	`setup_checklist_complete`	Flag: 1 = pre-run checklist completed

Key EDA findings:

Critical supplier lots: 54.0% High Risk vs 19.9% for standard lots — the largest single structural gap
Incomplete checklist: 42.5% High Risk vs 17.8% when complete — a process control lever, not a luck factor
Night shift: 38.7% High Risk vs 23.2% Day — partially explained by operator experience distribution

🤖 Model

Algorithm: Decision Tree (Gini, max_depth=5) — sklearn.tree.DecisionTreeClassifier

Decision Trees are the right model here for a reason that goes beyond performance: the output is a set of if-then-else rules that can be printed and posted at the press. The model doesn't just classify — it generates process documentation. A LinearSVC coefficient communicates direction; a Decision Tree rule communicates the exact threshold and the path to the decision.

This is a multiclass problem (three risk levels), so macro-averaged F1 is the primary metric — it penalizes poor performance on any class equally, regardless of frequency.

Why max_depth=5, min_samples_leaf=50: Deliberately constrained. Each leaf must represent at least 50 production runs — not a single outlier. The tree is slightly less accurate than an unconstrained version, and significantly more generalizable. In manufacturing, that trade is always worth it.

Preprocessing: OneHotEncoder on shift (three levels), passthrough on everything else. No scaling — trees split on thresholds, not distances.

📈 Key Results

Metric	Value
Test Accuracy	66.3%
Train Accuracy	72.1% (small gap — well controlled)
F1 Macro	63.8%
F1 Weighted	65.7%
CV Accuracy (5-fold)	68.8% ± 1.5%

Per-class performance:

Class	Precision	Recall	F1
High	0.64	0.70	0.67
Low	0.89	0.41	0.56
Medium	0.64	0.74	0.69

Honest note: Low class recall is 41% — the tree frequently confuses Low with Medium. Operationally, this is the less critical error: sending a Low-risk run through Medium-risk protocols wastes some caution, but doesn't allow scrap to happen undetected. The model prioritizes High-risk detection, which it handles well (70% recall).

Confusion matrix (600 test runs):

	Pred: High	Pred: Low	Pred: Medium
Actual: High	128 ✅	0	54
Actual: Low	0	48 ✅	70
Actual: Medium	72	6	222 ✅

🔍 Feature Importance (Gini)

Feature	Importance	What it means
`operator_experience_yrs`	30.4%	Strongest driver — experience compensates for difficult conditions
`setup_checklist_complete`	21.9%	Process control lever — the most actionable single intervention
`critical_supplier_lot`	20.1%	Material quality — triggers a mandatory change in run conditions
`press_speed_spm`	19.6%	Speed interacts with experience — not dangerous alone
`recent_model_change`	7.9%	Setup instability signal
`shift`, `hardness`, `temp`	0.0%	Zero Gini importance in this tree configuration

The top four features account for 92% of the model's decision power. The tree's top split is operator experience at 2 years — a threshold that any HR or production planning system already tracks.

🗂️ Repository Structure

Process_Decisions_Optimization/
├── 05_DT_Process_Decisions_Optimization.ipynb  # Notebook (no outputs)
├── scrap_risk_data.csv                         # Sample dataset (250 rows)
├── README.md
└── requirements.txt

📦 Full Project Pack — complete dataset (2,000 rows), notebook with full outputs including tree visualization and text rules, presentation deck (PPTX + PDF), and app.py pre-run risk simulator available on Gumroad.

🚀 How to Run

Option 1 — Google Colab: Click the badge above.

Option 2 — Local:

pip install -r requirements.txt
jupyter notebook 05_DT_Process_Decisions_Optimization.ipynb

💡 Key Learnings

The model output is the SOP — export_text() produces if-then-else rules that can be transcribed directly into process control documents. No translation needed between model and practice.
Multiclass F1 Macro is not optional — with three classes and class imbalance, accuracy is misleading. A model that ignores Low entirely can still score 80% accuracy. Macro F1 prevents that illusion.
Controlling complexity is a design decision — max_depth=5 and min_samples_leaf=50 are not limitations imposed by the data. They're choices made to produce a model that generalizes to next week's production, not just last week's.
Zero-importance features tell the story too — shift, hardness, and ambient temperature carry no Gini importance. This doesn't mean they're irrelevant to scrap — it means their effect is already captured by the features that matter (experience, checklist, supplier lot).
The interaction structure matters more than individual variables — press speed at 55 spm with an experienced operator is manageable. At 55 spm with a 6-month operator and an incomplete checklist, it isn't. Trees capture this logic without feature engineering.

👤 Author

Luis Lozano | Operational Excellence Manager · Master Black Belt · Machine Learning
GitHub: LozanoLsa · Gumroad: lozanolsa.gumroad.com

Turning Operations into Predictive Systems — Clone it. Fork it. Improve it.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
05_DT_Process_Decisions_Optimization.ipynb		05_DT_Process_Decisions_Optimization.ipynb
LICENSE		LICENSE
Process_Decisions_Optimization.pdf		Process_Decisions_Optimization.pdf
README.md		README.md
cover_05.png		cover_05.png
data_sources_and_features.txt		data_sources_and_features.txt
requirements.txt		requirements.txt
scrap_risk_data.csv		scrap_risk_data.csv
thumb_05.png		thumb_05.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Process Decisions Optimization — Decision Tree

🎯 Business Problem

📊 Dataset

🤖 Model

📈 Key Results

🔍 Feature Importance (Gini)

🗂️ Repository Structure

🚀 How to Run

💡 Key Learnings

👤 Author

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Process Decisions Optimization — Decision Tree

🎯 Business Problem

📊 Dataset

🤖 Model

📈 Key Results

🔍 Feature Importance (Gini)

🗂️ Repository Structure

🚀 How to Run

💡 Key Learnings

👤 Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages