An Automated Material Stream Identification (MSI) system that classifies waste/material images using a classic ML pipeline: dataset preparation, feature extraction (ResNet50 features and optional HOG exploration), feature scaling, and classifier training (SVM / KNN). The repository also includes a live webcam demo and a simple Python inference script.
The system predicts one of the following classes (IDs are used consistently across the repo):
- cardboard (0)
- glass (1)
- metal (2)
- paper (3)
- plastic (4)
- trash (5)
- unknown (6)
An "unknown" class is produced via a simple confidence-based rejection rule.
- Features
- Repository Structure
- Requirements
- Installation
- Quick Start (Inference)
- Live Camera Demo
- Training Pipeline (Notebooks)
- Generated Artifacts
- How Prediction Works
- Models
- Dataset
- Troubleshooting
- Notes
- Contributors
- End-to-end ML pipeline implemented via notebooks in
src/. - CNN feature extraction using
ResNet50(weights="imagenet", include_top=False)+GlobalAveragePooling2D. - Traditional feature extraction (optional exploration) via HOG in
Feature_Extraction_HOG.ipynb. - Feature scaling using
sklearn.preprocessing.Normalizer(norm="l2"). - Two classifiers with model selection via
GridSearchCV:- SVM (
src/5-SVM_Model.ipynb) - KNN (
src/5-KNN_Model.ipynb)
- SVM (
- Confidence-based rejection to return the
unknownclass for low-confidence predictions. - Live camera demo (
src/6-live_camera_app.ipynb).
Material-Stream-Identification-System/
├── models/ # Saved model files
│ ├── svm_model.pkl # Trained SVM model
│ ├── knn_model.pkl # Trained KNN model
│ └── scaler.pkl # Feature scaler
├── src/ # Source code
│ ├── 1-Train_Test_Split.ipynb
│ ├── 2-dataset_aug.ipynb
│ ├── 3-Feature_Extraction_CNN.ipynb
│ ├── 4-Feature_Scaling.ipynb
│ ├── 5-KNN_Model.ipynb
│ ├── 5-SVM_Model.ipynb
│ ├── 6-live_camera_app.ipynb
│ ├── Feature_Extraction_HOG.ipynb
│ ├── rejection.py
│ └── test.py
├── dataset/ # (Expected) original dataset folder (notebooks read from here)
├── data_split/ # Train/val/test splits generated by notebook (and train_aug)
├── augmented_dataset/ # (Optional) augmentation output folder (if used)
├── test/ # Sample images for quick testing
│ ├── cardboard.jpg
│ ├── glass.jpg
│ ├── metal.jpg
│ ├── paper.jpg
│ ├── plastic.jpg
│ ├── trash.jpg
│ └── tree(unknown).jpg
└── features/ # Extracted features
├── y_test.npy
├── y_train.npy
└── y_val.npy
The project dependencies are listed in requirements.txt:
opencv-pythonnumpytqdmpandasscikit-learnjoblibscipyscikit-imagematplotlibtensorflow
- Create and activate a virtual environment.
- Install dependencies:
pip install -r requirements.txtThe repo includes a runnable inference script: src/test.py.
src/test.py is written with default paths that work when the current working directory is src/.
- Run
python test.py - It will read images from
..\testand use the model..\models\svm_model.pkl
If you prefer staying in the repository root, you can call predict() by adding src/ to sys.path:
python -c "import sys; sys.path.append('src'); from test import predict; print(predict(r'test', r'models\\svm_model.pkl'))"Return value: a list of integer class IDs (0-6). See How Prediction Works.
Open and run:
src/6-live_camera_app.ipynb
This notebook:
- Loads
models/scaler.pkland an SVM model (default:models/svm_model.pkl) - Captures from webcam (
cv2.VideoCapture(0)) - Extracts ResNet50 features from a center ROI and predicts the class
Press q to quit.
The training workflow is implemented as a set of notebooks under src/ and is designed to be run in order:
-
1-Train_Test_Split.ipynb- Input:
dataset/(expected to contain one folder per class) - Output:
data_split/train,data_split/val,data_split/test
- Input:
-
2-dataset_aug.ipynb- Input:
data_split/train - Output:
data_split/train_aug
- Input:
-
3-Feature_Extraction_CNN.ipynb- Extracts 2048-d CNN features (ResNet50 + GlobalAveragePooling)
- Reads from:
data_split/train_augdata_split/valdata_split/test
- Writes to
features/:X_train_cnn.npy,X_val_cnn.npy,X_test_cnn.npyy_train.npy,y_val.npy,y_test.npy
-
Feature_Extraction_HOG.ipynb- Notebook for exploring HOG-based feature extraction.
-
4-Feature_Scaling.ipynb- Loads the CNN features from
features/ - Applies
Normalizer(norm="l2") - Writes scaled arrays to
features/:X_train_scaled.npy,X_val_scaled.npy,X_test_scaled.npy
- Saves the scaler to:
models/scaler.pkl
- Loads the CNN features from
-
5-KNN_Model.ipynb- Loads
features/X_*_scaled.npyandfeatures/y_*.npy - Trains KNN using
GridSearchCV - Saves the trained model to
models/knn_model.pkl
- Loads
-
5-SVM_Model.ipynb- Loads
features/X_*_scaled.npyandfeatures/y_*.npy - Trains SVM (
SVC(probability=True)) usingGridSearchCV - Saves the trained model to
models/svm_model.pkl
- Loads
This repository already contains generated artifacts (so you can run inference without retraining):
models/svm_model.pklmodels/knn_model.pklmodels/scaler.pklfeatures/X_*_cnn.npy,features/X_*_scaled.npy,features/y_*.npy
In src/test.py:
- Each image is resized to
224x224and preprocessed withpreprocess_input. - A ResNet50-based feature extractor produces a feature vector.
- If
models/scaler.pklexists next to the model, it is applied. - The classifier predicts and then a rejection rule is applied:
- For models with
predict_proba(e.g., SVM), the max probability must be>= 0.6. - For KNN, the fraction of neighbors voting for the predicted class must be
>= 0.6.
- For models with
- Otherwise, prediction becomes
unknown(6).
- SVM: trained in
src/5-SVM_Model.ipynbusingsklearn.svm.SVC(probability=True)and saved tomodels/svm_model.pkl. - KNN: trained in
src/5-KNN_Model.ipynbusingsklearn.neighbors.KNeighborsClassifierand saved tomodels/knn_model.pkl.
Both notebooks load the scaled CNN features from features/X_*_scaled.npy.
The notebooks expect a dataset under dataset/ with one subfolder per class name, for example:
dataset/
cardboard/
glass/
metal/
paper/
plastic/
trash/
The repository does not enforce a specific dataset source; you can place your own labeled dataset there.
- Running
src/test.pyfrom the repo root:- The script’s default
..\test/..\models\...paths assume the working directory issrc/. - Use the “Option B” command above if you want to run from the root.
- The script’s default
- Webcam not opening:
- In
src/6-live_camera_app.ipynb, changecv2.VideoCapture(0)tocv2.VideoCapture(1)(or another index).
- In
- TensorFlow logs / oneDNN:
src/test.pysetsTF_CPP_MIN_LOG_LEVELand disables oneDNN optimizations for a quieter/consistent run.
- The included inference code (
src/test.py) runs locally and reads images from disk. - Predictions use a simple rejection threshold (default
0.6) to output theunknownclass when confidence is low. - The live camera notebook loads models using relative paths (e.g.,
../models/...), so it should be run fromsrc/.
-
Abdelrahman Kadry
-
Ephraim Youssef
-
Omar Ahmed
-
Ramez Ragaay