Material Stream Identification System

An Automated Material Stream Identification (MSI) system that classifies waste/material images using a classic ML pipeline: dataset preparation, feature extraction (ResNet50 features and optional HOG exploration), feature scaling, and classifier training (SVM / KNN). The repository also includes a live webcam demo and a simple Python inference script.

Overview

The system predicts one of the following classes (IDs are used consistently across the repo):

cardboard (0)
glass (1)
metal (2)
paper (3)
plastic (4)
trash (5)
unknown (6)

An "unknown" class is produced via a simple confidence-based rejection rule.

Features

End-to-end ML pipeline implemented via notebooks in src/.
CNN feature extraction using ResNet50(weights="imagenet", include_top=False) + GlobalAveragePooling2D.
Traditional feature extraction (optional exploration) via HOG in Feature_Extraction_HOG.ipynb.
Feature scaling using sklearn.preprocessing.Normalizer(norm="l2").
Two classifiers with model selection via GridSearchCV:
- SVM (src/5-SVM_Model.ipynb)
- KNN (src/5-KNN_Model.ipynb)
Confidence-based rejection to return the unknown class for low-confidence predictions.
Live camera demo (src/6-live_camera_app.ipynb).

Repository Structure

Material-Stream-Identification-System/
├── models/                             # Saved model files
│   ├── svm_model.pkl                   # Trained SVM model
│   ├── knn_model.pkl                   # Trained KNN model
│   └── scaler.pkl                      # Feature scaler
├── src/                                # Source code
│   ├── 1-Train_Test_Split.ipynb
│   ├── 2-dataset_aug.ipynb
│   ├── 3-Feature_Extraction_CNN.ipynb
│   ├── 4-Feature_Scaling.ipynb
│   ├── 5-KNN_Model.ipynb
│   ├── 5-SVM_Model.ipynb
│   ├── 6-live_camera_app.ipynb
│   ├── Feature_Extraction_HOG.ipynb
│   ├── rejection.py
│   └── test.py
├── dataset/                            # (Expected) original dataset folder (notebooks read from here)
├── data_split/                         # Train/val/test splits generated by notebook (and train_aug)
├── augmented_dataset/                  # (Optional) augmentation output folder (if used)
├── test/                               # Sample images for quick testing
│   ├── cardboard.jpg
│   ├── glass.jpg
│   ├── metal.jpg
│   ├── paper.jpg
│   ├── plastic.jpg
│   ├── trash.jpg
│   └── tree(unknown).jpg
└── features/                           # Extracted features
    ├── y_test.npy
    ├── y_train.npy
    └── y_val.npy

Requirements

The project dependencies are listed in requirements.txt:

opencv-python
numpy
tqdm
pandas
scikit-learn
joblib
scipy
scikit-image
matplotlib
tensorflow

Installation

Create and activate a virtual environment.
Install dependencies:

pip install -r requirements.txt

Quick Start (Inference)

The repo includes a runnable inference script: src/test.py.

Option A: Run using the defaults (recommended)

src/test.py is written with default paths that work when the current working directory is src/.

Run python test.py
It will read images from ..\test and use the model ..\models\svm_model.pkl

Option B: Call the `predict()` function from the repo root

If you prefer staying in the repository root, you can call predict() by adding src/ to sys.path:

python -c "import sys; sys.path.append('src'); from test import predict; print(predict(r'test', r'models\\svm_model.pkl'))"

Return value: a list of integer class IDs (0-6). See How Prediction Works.

Live Camera Demo

Open and run:

src/6-live_camera_app.ipynb

This notebook:

Loads models/scaler.pkl and an SVM model (default: models/svm_model.pkl)
Captures from webcam (cv2.VideoCapture(0))
Extracts ResNet50 features from a center ROI and predicts the class

Press q to quit.

Training Pipeline (Notebooks)

The training workflow is implemented as a set of notebooks under src/ and is designed to be run in order:

1-Train_Test_Split.ipynb
- Input: dataset/ (expected to contain one folder per class)
- Output: data_split/train, data_split/val, data_split/test
2-dataset_aug.ipynb
- Input: data_split/train
- Output: data_split/train_aug
3-Feature_Extraction_CNN.ipynb
- Extracts 2048-d CNN features (ResNet50 + GlobalAveragePooling)
- Reads from:
  - data_split/train_aug
  - data_split/val
  - data_split/test
- Writes to features/:
  - X_train_cnn.npy, X_val_cnn.npy, X_test_cnn.npy
  - y_train.npy, y_val.npy, y_test.npy
Feature_Extraction_HOG.ipynb
- Notebook for exploring HOG-based feature extraction.
4-Feature_Scaling.ipynb
- Loads the CNN features from features/
- Applies Normalizer(norm="l2")
- Writes scaled arrays to features/:
  - X_train_scaled.npy, X_val_scaled.npy, X_test_scaled.npy
- Saves the scaler to: models/scaler.pkl
5-KNN_Model.ipynb
- Loads features/X_*_scaled.npy and features/y_*.npy
- Trains KNN using GridSearchCV
- Saves the trained model to models/knn_model.pkl
5-SVM_Model.ipynb
- Loads features/X_*_scaled.npy and features/y_*.npy
- Trains SVM (SVC(probability=True)) using GridSearchCV
- Saves the trained model to models/svm_model.pkl

Generated Artifacts

This repository already contains generated artifacts (so you can run inference without retraining):

models/svm_model.pkl
models/knn_model.pkl
models/scaler.pkl
features/X_*_cnn.npy, features/X_*_scaled.npy, features/y_*.npy

How Prediction Works

In src/test.py:

Each image is resized to 224x224 and preprocessed with preprocess_input.
A ResNet50-based feature extractor produces a feature vector.
If models/scaler.pkl exists next to the model, it is applied.
The classifier predicts and then a rejection rule is applied:
- For models with predict_proba (e.g., SVM), the max probability must be >= 0.6.
- For KNN, the fraction of neighbors voting for the predicted class must be >= 0.6.
Otherwise, prediction becomes unknown (6).

Models

SVM: trained in src/5-SVM_Model.ipynb using sklearn.svm.SVC(probability=True) and saved to models/svm_model.pkl.
KNN: trained in src/5-KNN_Model.ipynb using sklearn.neighbors.KNeighborsClassifier and saved to models/knn_model.pkl.

Both notebooks load the scaled CNN features from features/X_*_scaled.npy.

Dataset

The notebooks expect a dataset under dataset/ with one subfolder per class name, for example:

dataset/
  cardboard/
  glass/
  metal/
  paper/
  plastic/
  trash/

The repository does not enforce a specific dataset source; you can place your own labeled dataset there.

Troubleshooting

Running src/test.py from the repo root:
- The script’s default ..\test / ..\models\... paths assume the working directory is src/.
- Use the “Option B” command above if you want to run from the root.
Webcam not opening:
- In src/6-live_camera_app.ipynb, change cv2.VideoCapture(0) to cv2.VideoCapture(1) (or another index).
TensorFlow logs / oneDNN:
- src/test.py sets TF_CPP_MIN_LOG_LEVEL and disables oneDNN optimizations for a quieter/consistent run.

Notes

The included inference code (src/test.py) runs locally and reads images from disk.
Predictions use a simple rejection threshold (default 0.6) to output the unknown class when confidence is low.
The live camera notebook loads models using relative paths (e.g., ../models/...), so it should be run from src/.

Contributors

Abdelrahman Kadry
- GitHub Profile
- LinkedIn Profile
Ephraim Youssef
- GitHub Profile
- LinkedIn Profile
Omar Ahmed
- GitHub Profile
- LinkedIn Profile
Ramez Ragaay
- GitHub Profile
- LinkedIn Profile

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
dataset		dataset
src		src
test		test
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Material Stream Identification System

Overview

Table of Contents

Features

Repository Structure

Requirements

Installation

Quick Start (Inference)

Option A: Run using the defaults (recommended)

Option B: Call the `predict()` function from the repo root

Live Camera Demo

Training Pipeline (Notebooks)

Generated Artifacts

How Prediction Works

Models

Dataset

Troubleshooting

Notes

Contributors

About

Uh oh!

Releases

Packages

Languages

EphraimYoussef/MSI-System-ML-Project

Folders and files

Latest commit

History

Repository files navigation

Material Stream Identification System

Overview

Table of Contents

Features

Repository Structure

Requirements

Installation

Quick Start (Inference)

Option A: Run using the defaults (recommended)

Option B: Call the predict() function from the repo root

Live Camera Demo

Training Pipeline (Notebooks)

Generated Artifacts

How Prediction Works

Models

Dataset

Troubleshooting

Notes

Contributors

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Option B: Call the `predict()` function from the repo root

Packages