Skip to content

An Automated Material Stream Identification (MSI) system using fundamental Machine Learning (ML) techniques that classifies waste/material images using a classic ML pipeline

Notifications You must be signed in to change notification settings

EphraimYoussef/MSI-System-ML-Project

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Material Stream Identification System

An Automated Material Stream Identification (MSI) system that classifies waste/material images using a classic ML pipeline: dataset preparation, feature extraction (ResNet50 features and optional HOG exploration), feature scaling, and classifier training (SVM / KNN). The repository also includes a live webcam demo and a simple Python inference script.


Overview

The system predicts one of the following classes (IDs are used consistently across the repo):

  • cardboard (0)
  • glass (1)
  • metal (2)
  • paper (3)
  • plastic (4)
  • trash (5)
  • unknown (6)

An "unknown" class is produced via a simple confidence-based rejection rule.


Table of Contents


Features

  • End-to-end ML pipeline implemented via notebooks in src/.
  • CNN feature extraction using ResNet50(weights="imagenet", include_top=False) + GlobalAveragePooling2D.
  • Traditional feature extraction (optional exploration) via HOG in Feature_Extraction_HOG.ipynb.
  • Feature scaling using sklearn.preprocessing.Normalizer(norm="l2").
  • Two classifiers with model selection via GridSearchCV:
    • SVM (src/5-SVM_Model.ipynb)
    • KNN (src/5-KNN_Model.ipynb)
  • Confidence-based rejection to return the unknown class for low-confidence predictions.
  • Live camera demo (src/6-live_camera_app.ipynb).

Repository Structure

Material-Stream-Identification-System/
├── models/                             # Saved model files
│   ├── svm_model.pkl                   # Trained SVM model
│   ├── knn_model.pkl                   # Trained KNN model
│   └── scaler.pkl                      # Feature scaler
├── src/                                # Source code
│   ├── 1-Train_Test_Split.ipynb
│   ├── 2-dataset_aug.ipynb
│   ├── 3-Feature_Extraction_CNN.ipynb
│   ├── 4-Feature_Scaling.ipynb
│   ├── 5-KNN_Model.ipynb
│   ├── 5-SVM_Model.ipynb
│   ├── 6-live_camera_app.ipynb
│   ├── Feature_Extraction_HOG.ipynb
│   ├── rejection.py
│   └── test.py
├── dataset/                            # (Expected) original dataset folder (notebooks read from here)
├── data_split/                         # Train/val/test splits generated by notebook (and train_aug)
├── augmented_dataset/                  # (Optional) augmentation output folder (if used)
├── test/                               # Sample images for quick testing
│   ├── cardboard.jpg
│   ├── glass.jpg
│   ├── metal.jpg
│   ├── paper.jpg
│   ├── plastic.jpg
│   ├── trash.jpg
│   └── tree(unknown).jpg
└── features/                           # Extracted features
    ├── y_test.npy
    ├── y_train.npy
    └── y_val.npy

Requirements

The project dependencies are listed in requirements.txt:

  • opencv-python
  • numpy
  • tqdm
  • pandas
  • scikit-learn
  • joblib
  • scipy
  • scikit-image
  • matplotlib
  • tensorflow

Installation

  1. Create and activate a virtual environment.
  2. Install dependencies:
pip install -r requirements.txt

Quick Start (Inference)

The repo includes a runnable inference script: src/test.py.

Option A: Run using the defaults (recommended)

src/test.py is written with default paths that work when the current working directory is src/.

  • Run python test.py
  • It will read images from ..\test and use the model ..\models\svm_model.pkl

Option B: Call the predict() function from the repo root

If you prefer staying in the repository root, you can call predict() by adding src/ to sys.path:

python -c "import sys; sys.path.append('src'); from test import predict; print(predict(r'test', r'models\\svm_model.pkl'))"

Return value: a list of integer class IDs (0-6). See How Prediction Works.


Live Camera Demo

Open and run:

  • src/6-live_camera_app.ipynb

This notebook:

  • Loads models/scaler.pkl and an SVM model (default: models/svm_model.pkl)
  • Captures from webcam (cv2.VideoCapture(0))
  • Extracts ResNet50 features from a center ROI and predicts the class

Press q to quit.


Training Pipeline (Notebooks)

The training workflow is implemented as a set of notebooks under src/ and is designed to be run in order:

  • 1-Train_Test_Split.ipynb

    • Input: dataset/ (expected to contain one folder per class)
    • Output: data_split/train, data_split/val, data_split/test
  • 2-dataset_aug.ipynb

    • Input: data_split/train
    • Output: data_split/train_aug
  • 3-Feature_Extraction_CNN.ipynb

    • Extracts 2048-d CNN features (ResNet50 + GlobalAveragePooling)
    • Reads from:
      • data_split/train_aug
      • data_split/val
      • data_split/test
    • Writes to features/:
      • X_train_cnn.npy, X_val_cnn.npy, X_test_cnn.npy
      • y_train.npy, y_val.npy, y_test.npy
  • Feature_Extraction_HOG.ipynb

    • Notebook for exploring HOG-based feature extraction.
  • 4-Feature_Scaling.ipynb

    • Loads the CNN features from features/
    • Applies Normalizer(norm="l2")
    • Writes scaled arrays to features/:
      • X_train_scaled.npy, X_val_scaled.npy, X_test_scaled.npy
    • Saves the scaler to: models/scaler.pkl
  • 5-KNN_Model.ipynb

    • Loads features/X_*_scaled.npy and features/y_*.npy
    • Trains KNN using GridSearchCV
    • Saves the trained model to models/knn_model.pkl
  • 5-SVM_Model.ipynb

    • Loads features/X_*_scaled.npy and features/y_*.npy
    • Trains SVM (SVC(probability=True)) using GridSearchCV
    • Saves the trained model to models/svm_model.pkl

Generated Artifacts

This repository already contains generated artifacts (so you can run inference without retraining):

  • models/svm_model.pkl
  • models/knn_model.pkl
  • models/scaler.pkl
  • features/X_*_cnn.npy, features/X_*_scaled.npy, features/y_*.npy

How Prediction Works

In src/test.py:

  • Each image is resized to 224x224 and preprocessed with preprocess_input.
  • A ResNet50-based feature extractor produces a feature vector.
  • If models/scaler.pkl exists next to the model, it is applied.
  • The classifier predicts and then a rejection rule is applied:
    • For models with predict_proba (e.g., SVM), the max probability must be >= 0.6.
    • For KNN, the fraction of neighbors voting for the predicted class must be >= 0.6.
  • Otherwise, prediction becomes unknown (6).

Models

  • SVM: trained in src/5-SVM_Model.ipynb using sklearn.svm.SVC(probability=True) and saved to models/svm_model.pkl.
  • KNN: trained in src/5-KNN_Model.ipynb using sklearn.neighbors.KNeighborsClassifier and saved to models/knn_model.pkl.

Both notebooks load the scaled CNN features from features/X_*_scaled.npy.


Dataset

The notebooks expect a dataset under dataset/ with one subfolder per class name, for example:

dataset/
  cardboard/
  glass/
  metal/
  paper/
  plastic/
  trash/

The repository does not enforce a specific dataset source; you can place your own labeled dataset there.


Troubleshooting

  • Running src/test.py from the repo root:
    • The script’s default ..\test / ..\models\... paths assume the working directory is src/.
    • Use the “Option B” command above if you want to run from the root.
  • Webcam not opening:
    • In src/6-live_camera_app.ipynb, change cv2.VideoCapture(0) to cv2.VideoCapture(1) (or another index).
  • TensorFlow logs / oneDNN:
    • src/test.py sets TF_CPP_MIN_LOG_LEVEL and disables oneDNN optimizations for a quieter/consistent run.

Notes

  • The included inference code (src/test.py) runs locally and reads images from disk.
  • Predictions use a simple rejection threshold (default 0.6) to output the unknown class when confidence is low.
  • The live camera notebook loads models using relative paths (e.g., ../models/...), so it should be run from src/.

Contributors

About

An Automated Material Stream Identification (MSI) system using fundamental Machine Learning (ML) techniques that classifies waste/material images using a classic ML pipeline

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.4%
  • Python 0.6%