This project is a complete machine learning pipeline for classifying US Traffic Signs from the LISA dataset using deep neural network embeddings and traditional/custom classifiers.
Dataset:
LISAC Traffic Sign Classification Dataset
Processes raw images, extracts high-level semantic features using MobileNetV2, reduces dimensionality using PCA and MDA, and evaluates multiple classifiers and Bayesian classifiers (Parametric and Non-Parametric).
- Clone the repository and navigate into the project directory.
- Create and activate a virtual environment.
- Install the required dependencies:
pip install -r requirements.txt
python run_pipeline.py
This script will run:
preprocess.py: Splits the raw dataset into Train/Valid/Test.extract_features.py: Extracts 1280-dimensional features using MobileNetV2.PCA_dim_reduction.py&MDA_dim_reduction.py: Reduces dimensions using PCA and MDA.train.py: Trains SVM, Random Forest, k-NN, and Bayesian Classifiers.evaluate.py: Generates confusion matrices, reports, and error.repeated_experiments.py: Evaluates variance and stability of classifiers via multiple data splits.
Average test set accuracy and stability metrics achieved across 5 randomized training splits:
| Classifier | Mean Accuracy | Variance | Std Dev | Mean Time (s) |
|---|---|---|---|---|
| SVM | 91.61% | 0.000027 | 0.0052 | 0.1172 |
| k-NN | 92.34% | 0.000027 | 0.0052 | 0.2569 |
| Bayesian Non-Parametric | 92.34% | 0.000027 | 0.0052 | 0.0754 |
| Bayesian Parametric | 90.66% | 0.000009 | 0.0029 | 0.0055 |
| Random Forest | 86.57% | 0.000167 | 0.0129 | 0.5972 |
| Classifier | Mean Accuracy | Variance | Std Dev | Mean Time (s) |
|---|---|---|---|---|
| k-NN | 91.02% | 0.000030 | 0.0055 | 0.0068 |
| Bayesian Non-Parametric | 90.58% | 0.000061 | 0.0078 | 0.0224 |
| SVM | 90.36% | 0.000056 | 0.0075 | 0.0221 |
| Bayesian Parametric | 89.12% | 0.000061 | 0.0078 | 0.0014 |
| Random Forest | 82.55% | 0.000556 | 0.0236 | 0.0990 |
Aashish Harishchandre
