Research Project by Linuk Perera
Method 1: Gesture Recognition Using CNN (Convolutional Neural Network) and LSTM (Long Short-Term Memory) Networks
This approach aims to capture key points in a human hand to develop a gesture recognition system using HPE (Hand Pose Estimation) techniques employing a CNN (Convolutional Neural Network) and LSTM networks.
Method 2: Gesture Recognition Using CNN (Convolutional Neural Network) and Dense NN (Neural Network) and Semi-Supervised Learning (Pipeline of Models)
This approach captures weighted human hand velocity data using Hand Pose Estimation (HPE) techniques with MediaPipe Hands, processes it through a Convolutional Neural Network (CNN) for feature extraction, and employs a Dense Neural Network for gesture classification. Additionally, semi-supervised learning methods (Self-Training and Mean Teacher) are used to leverage unlabeled data, improving model robustness and accuracy. This method is lightweight, accurate, and optimized for real-time gesture recognition, with specific handling for swapped "Swipe Left" (rightward motion) and "Swipe Right" (leftward motion) classifications and recoil filtering.
This approach leverages velocity-time graphs (Acceleration Profiles) to evaluate patterns of gestures and recoils. By analyzing these profiles, the model effectively filters out unintended recoil movements and focuses on recognizing intentional gestures. The dataset is collected as temporary labeled sequential CSV files, capturing velocity data over time. The model achieves an accuracy of 88% when tested against an independent (foreign) dataset, demonstrating robust performance in gesture recognition.
- Collect gesture data using a camera.
- Augment the collected data to increase the dataset's size and diversity.
- Shuffle the now Augmented data.
- Preprocess the data for training, including normalizing and creating sequences.
- Train an LSTM model for gesture recognition.
- Evaluate the model and visualize the results, including a classification report and confusion matrix.
The dataset is collected at 15 frames per sample, with 100 samples per gesture. Data augmentation multiplies the dataset by 25 times by applying different transformations (shifting, scaling, etc.).
- Collect gesture data using a camera.
- Evaluate the recorded data and visualize the results.
- Preprocess the data for training, including normalizing and creating sequences.
- Evaluate the Preprocessed data and visualize the results.
- Train NN and Semi-Supervised models for gesture recognition and recoil filtering.
- Evaluate the model and visualize the results, including a classification report and confusion matrix.
The dataset is collected continuously (e.g., 1 seconds per sample, 50 samples per gesture), saved as NumPy arrays, and preprocessed to include features like absolute velocities, directions, and movement angles. Semi-supervised training uses unlabeled data to enhance performance, with recoil filtering to handle unintended movements.
- Collect gesture data using a camera, capturing velocity-time data.
- Generate velocity profiles to analyze gesture and recoil patterns.
- Preprocess the data into temporary labeled sequential CSV files.
- Train an LSTM model optimized for acceleration profile analysis.
- Perform live inference for real-time gesture recognition.
- Evaluate the model and visualize results, including classification reports and confusion matrices.
The dataset is collected as sequential velocity data, stored in temporary CSV files, and processed to filter out recoils, achieving high accuracy in gesture recognition.
-
GestureCollection.py:- Captures gesture data at 15 frames per sample.
- Collects 100 samples for each gesture.
- Includes a user-friendly guide to start and stop gesture collection.
- Data is saved in CSV format with landmarks and corresponding gesture labels.
-
DataAugmentation.py:- Increases the dataset size by applying augmentation techniques.
- Uses shifting (top left, top middle, top right, left middle, right middle, bottom left and right) and scaling for Z-axis transformation.
-
DataShuffle.py:- Shuffles each data sample, (Each sample has 15 Frames)
- Shuffled data reduces the risk of overfitting
-
DataPreprocessor.py:- Loads and preprocesses the collected and augmented gesture data.
- Normalizes the landmarks, creates sequences for gesture frames, encodes labels, and splits the data into training, validation, and test sets.
-
ModelEvaluation.py:- Loads the trained LSTM model and evaluates its performance.
- Generates a classification report and confusion matrix.
- Visualizes the confusion matrix using a heatmap.
-
LSTMGestureClassificationModel.py:- Defines and trains the LSTM model for gesture recognition.
- Includes model architecture, loss function, optimizer, and training steps.
-
GestureRecognition.py- Upon running all prerequisite scripts, (1 through 7) a .h5 file will be created alongside a csv file thatβs used for determining the gesture classes that the model can recognize
-
data_collection_NN.py:- Captures continuous gesture data (1 second per sample, 50 samples per gesture) using MediaPipe Hands.
- Collects weighted hand velocities for each gesture.
- Saves data as NumPy arrays (
training_data.npy,labels.npy).
-
Data_collection_Semi_Supervised.py:- Collects gesture data with recoil detection and swapped "Swipe Left" (rightward,
velocity_x > 1000) and "Swipe Right" (leftward,velocity_x < -1000) thresholds. - Records 15 samples per gesture (3 seconds each) with a user-friendly interface.
- Saves unlabeled data as
unlabeled_gesture_data.npyfor semi-supervised training.
- Collects gesture data with recoil detection and swapped "Swipe Left" (rightward,
-
Preprocessor.py:- Preprocesses collected data for supervised training.
- Normalizes velocities and angles, creates sequences, and splits data into training, validation, and test sets.
- Outputs preprocessed data as
processed_training_data_with_angles.npy.
-
Preprocessor_semi.py:- Preprocesses unlabeled data from
Data_collection_Semi_Supervised.pyfor semi-supervised training. - Normalizes features and filters outliers.
- Outputs
preprocessed_unlabeled_data.npy.
- Preprocesses unlabeled data from
-
npy_to_csv_for_semi.py:- Converts NumPy arrays (e.g.,
processed_training_data_with_angles.npy) to CSV format for compatibility with semi-supervised training and analysis.
- Converts NumPy arrays (e.g.,
-
NNModel.py:- Defines and trains a Dense Neural Network for supervised gesture classification.
- Uses features like absolute velocities, directions, and angles.
- Saves the trained model as
gesture_classification_model.h5(This is needed to train both Semi-Supervised models).
-
self_training_model_train.py:- Implements Self-Training, a semi-supervised method that uses pseudo-labels for high-confidence unlabeled data.
- Incorporates recoil filtering (velocity < 300, opposite direction).
- Saves the trained model as
self_training_recoil_model.h5.
-
Mean_Teacher_model_train.py:- Implements the Mean Teacher semi-supervised method, using a teacher-student model with consistency loss.
- Includes recoil filtering and swapped left/right thresholds.
- Saves the trained model as
mean_teacher_recoil_model.h5.
-
Evaluation.py:- Evaluates the supervised model (
gesture_classification_model.h5) on test data. - Generates classification reports and confusion matrices.
- Evaluates the supervised model (
-
PreProcessedEvaluation.py:- Evaluates preprocessed data quality, visualizing feature distributions and recoil detection.
-
Model_evaluate_semi-supervised.py:- Evaluates semi-supervised models (
self_training_recoil_model.h5,mean_teacher_recoil_model.h5) on test data. - Generates classification reports, confusion matrices, and comparative performance plots.
- Evaluates semi-supervised models (
-
model_test.py:- Tests models on specific test cases or subsets of data for debugging or validation.
-
plotClasses.py:- Visualizes class distributions or gesture features (e.g., velocity vs. angle scatter plots) to analyze data balance.
-
GUI_NN_model.py:- Provides a real-time GUI for gesture recognition using the supervised model (
gesture_classification_model.h5). - Displays predictions with a darkened overlay and hand landmarks.
- Provides a real-time GUI for gesture recognition using the supervised model (
-
gui_Semi_models.py:- Real-time GUI for gesture recognition using semi-supervised models (
self_training_recoil_model.h5ormean_teacher_recoil_model.h5). - Incorporates recoil filtering and swapped left/right thresholds for accurate classification.
- Real-time GUI for gesture recognition using semi-supervised models (
-
DataSet_collect.py:- Captures velocity-time data for gestures using a camera.
- Saves data as temporary labeled sequential CSV files for acceleration profile analysis.
-
Acceleration_profiles.py:- Generates and visualizes velocity-time graphs (Acceleration Profiles) to analyze gesture and recoil patterns.
- Outputs visualizations for data exploration.
-
train_LSTM_model_Acceleration_Profiles.py:- Defines and trains an LSTM model optimized for acceleration profile-based gesture recognition.
- Filters out recoils and focuses on intentional gestures.
- Saves the trained model as
lstm_acceleration_profiles_model.h5.
-
LSTM_Gesture_prediction.py:- Performs live inference for real-time gesture recognition using the trained LSTM model.
- Displays predictions with a user-friendly interface.
To run the scripts, install the following dependencies:
- Python 3.6+ (3.9 recommended for compatibility with OpenCV and TensorFlow I have used Python 3.9.18)
- OpenCV (
opencv-python) - MediaPipe (
mediapipe) - NumPy (
numpy) - Pandas (
pandas) - scikit-learn (
scikit-learn) - Matplotlib (
matplotlib) - Seaborn (
seaborn) - TensorFlow (
tensorflow, for model training and evaluation)
Install dependencies using pip:
pip install opencv-python mediapipe numpy pandas scikit-learn matplotlib seaborn tensorflowRun GestureCollection.py to collect gesture data. You will be prompted to enter a label for each gesture.
python GestureCollection.py- Press
's'to start collecting gestures. - You will be asked to input a gesture label.
- The script will collect 100 samples per gesture and save them in a CSV file.
Once you've collected enough gesture data, you can use DataAugmentation.py to augment the data by applying transformations like shifting and scaling.
python DataAugmentation.pyThis will generate augmented versions of the dataset and save them to new files.
Run DataShuffle.py, this step is optional, but will help in training to reduce chances of overfit.
python DataShuffle.pyThis will shuffle the samples in your csv file
Run DataPreprocessor.py to preprocess the collected and augmented data. This step normalizes the data, creates sequences, and splits it into training, validation, and test sets.
python DataPreprocessor.pyThis will preprocess the data and save the resulting arrays in .npy format, which will be used to train the model.
Use LSTMGestureClassificationModel.py to define, compile, and train the LSTM model.
python LSTMGestureClassificationModel.pyThis script will train the model on the preprocessed data and save the trained model to a file (e.g., gesture_lstm_model.h5).
After training the model, use ModelEvaluation.py to load the trained model and evaluate its performance on the test set.
python ModelEvaluation.pyThis will generate:
- A classification report.
- A confusion matrix heatmap showing the performance of the model.
Run either of the following scripts to collect gesture data:
python data_collection_NN.py
python Data_collection_Semi_Supervised.py- Follow on-screen prompts to perform gestures:
Swipe Up,Swipe Down,Swipe Right,Swipe Left. - Data is saved as NumPy arrays:
training_data.npyunlabeled_gesture_data.npy
Preprocess the data for model training:
python Preprocessor.py
python Preprocessor_semi.py- Outputs:
processed_training_data_with_angles.npypreprocessed_unlabeled_data.npy
Optional: Convert .npy files to .csv:
python npy_to_csv_for_semi.pyTrain the supervised Dense Neural Network model:
python NNModel.pyTrain semi-supervised models:
python self_training_model_train.py
python Mean_Teacher_model_train.py- Output models:
gesture_classification_model.h5self_training_recoil_model.h5mean_teacher_recoil_model.h5
Evaluate models on the test data:
python Evaluation.py # Supervised model
python Model_evaluate_semi-supervised.py # Semi-supervised models
python PreProcessedEvaluation.py # Data quality check- Generates:
- Classification reports
- Confusion matrices (
confusion_matrix.png) - Performance comparison plots (
model_comparison_plot.png)
Run the real-time GUI interface:
python GUI_NN_model.py # Supervised model
python gui_Semi_models.py # Semi-supervised models- Displays:
- Live gesture predictions
- Hand landmarks
- Modern UI overlay
Run DataSet_collect.py to collect velocity-time data for gestures.
python DataSet_collect.py- Follow prompts to perform gestures.
- Data is saved as temporary labeled sequential CSV files.
Run Acceleration_profiles.py to generate and visualize acceleration profiles.
python Acceleration_profiles.py- Outputs visualizations of velocity-time graphs for gesture and recoil analysis.
Run train_LSTM_model_Acceleration_Profiles.py to train the LSTM model on acceleration profiles.
python train_LSTM_model_Acceleration_Profiles.py- Saves the trained model as
lstm_acceleration_profiles_model.h5.
Run LSTM_Gesture_prediction.py for live gesture recognition.
python LSTM_Gesture_prediction.py- Displays real-time predictions with a user-friendly interface.
The project includes visualizations to provide insights into the gesture data and model performance:
- Average Velocity Scatter Plot: Visualizes the distribution of average velocities for gestures, highlighting patterns in the data.

- Average Velocity Weighted: Shows weighted velocity distributions to emphasize key gesture features. The features of each class are better clustered after weighting prooving its effectiveness.

- Inference Time Distribution: Compares inference times across models to evaluate real-time performance.

- Average Inference Time: Provides a comparative view of average inference times across models.

- Gesture Data (CSV): Data captured from
GestureCollection.py. - Augmented Data (CSV): Data after applying transformations from
DataAugmentation.py. - Preprocessed Data (Numpy
.npyfiles): Data ready for training. - Model: The trained LSTM model (
gesture_lstm_model.h5). - Evaluation Results: Classification report and confusion matrix heatmap.
- Collected Data (Numpy
.npyfiles): Data captured fromtraining_data.npy. - Label Data (Numpy
.npyfiles): Data captured fromlabels.npy. - Preprocessed Data (Numpy
.npyfiles): Data ready for training fromprocessed_training_data_with_angles.npy.
| Type | Files |
|---|---|
| Collected Data | training_data.npy, unlabeled_gesture_data.npy |
| Preprocessed Data | processed_training_data_with_angles.npy, preprocessed_unlabeled_data.npy |
| Converted Data | CSV files from npy_to_csv_for_semi.py (e.g., processed_data.csv) |
| Trained Models | gesture_classification_model.h5, self_training_recoil_model.h5, mean_teacher_recoil_model.h5 |
| Evaluation Results | confusion_matrix.png, model_comparison_plot.png, classification reports |
| Type | Files |
|---|---|
| Collected Data | Temporary labeled sequential CSV files from DataSet_collect.py |
| Visualizations | Velocity-time graphs from Velocity_profiles.py |
| Trained Model | lstm_acceleration_profiles_model.h5 |
| Evaluation Results | Classification reports and confusion matrices |
.
βββ GestureCollection.py # Script to collect gesture data
βββ DataAugmentation.py # Script for data augmentation
βββ DataShuffle.py # Script for data shuffling
βββ DataPreprocessor.py # Script to preprocess the data
βββ ModelEvaluation.py # Script to evaluate the model
βββ LSTMGestureClassificationModel.py # Script to define and train the LSTM model
βββ GestureRecognition.py # Script that recognizes live gestures
βββ augmented_gesture_data.csv # Example of collected and augmented data
βββ gesture_lstm_model.h5 # Trained LSTM model (after training)
βββ label_encoder.npy # Label encoder (for evaluation)
βββ confusion_matrix.png # Confusion matrix heatmap (for evaluation)
.
βββ data_collection_NN.py # Collects gesture data for supervised training
βββ Data_collection_Semi_Supervised.py # Collects gesture data with recoil and swapped thresholds
βββ Preprocessor.py # Preprocesses data for supervised training
βββ Preprocessor_semi.py # Preprocesses unlabeled data for semi-supervised training
βββ npy_to_csv_for_semi.py # Converts NumPy arrays to CSV
βββ NNModel.py # Trains supervised Dense NN model
βββ self_training_model_train.py # Trains Self-Training semi-supervised model
βββ Mean_Teacher_model_train.py # Trains Mean Teacher semi-supervised model
βββ Evaluation.py # Evaluates supervised model
βββ PreProcessedEvaluation.py # Evaluates preprocessed data quality
βββ Model_evaluate_semi-supervised.py # Evaluates semi-supervised models
βββ model_test.py # Tests models on specific cases
βββ plotClasses.py # Visualizes class distributions/features
βββ GUI_NN_model.py # GUI for supervised model recognition
βββ gui_Semi_models.py # GUI for semi-supervised model recognition
βββ LICENSE # MIT License file
βββ training_data.npy # Example collected data (supervised)
βββ unlabeled_gesture_data.npy # Example collected data (semi-supervised)
βββ processed_training_data_with_angles.npy # Example preprocessed data
βββ preprocessed_unlabeled_data.npy # Example preprocessed unlabeled data
βββ gesture_classification_model.h5 # Trained supervised model
βββ self_training_recoil_model.h5 # Trained Self-Training model
βββ mean_teacher_recoil_model.h5 # Trained Mean Teacher model
βββ confusion_matrix.png # Example confusion matrix heatmap
βββ model_comparison_plot.png # Example comparative performance plot
.
βββ DataSet_collect.py # Collects velocity-time data as CSV files
βββ Velocity_profiles.py # Generates and visualizes acceleration profiles
βββ train_LSTM_model_Acceleration_Profiles.py # Trains LSTM model for acceleration profiles
βββ LSTM_Gesture_prediction.py # Performs live gesture recognition
βββ lstm_acceleration_profiles_model.h5 # Trained LSTM model (after training)
βββ temporary_sequential_data.csv # Example temporary labeled sequential CSV file
After training and evaluating the models, you'll have access to:
- Classification report: A detailed report of precision, recall, f1-score, and support for each gesture class.
- Confusion matrix heatmap: A visual representation of the modelβs confusion matrix, showing true vs predicted labels.
- Method 1 (LSTM Model): Prone to overfitting, sothis methodology wont be discussed:
- Method 2 (Neural Network): Achieved an accuracy of 97% when tested against an independent dataset.

- Method 2 (Self-Training Semi-Supervised): Achieved an accuracy of 69.57% against an independent dataset.

- Method 2 (Mean Teacher Semi-Supervised): Achieved an accuracy of 63.04% against an independent dataset.

- Method 3 (Acceleration Profile Driven LSTM): Achieved an accuracy of 88% when tested against an independent dataset, demonstrating superior performance in filtering recoils and recognizing gestures.

The results confirm the success of the proposed methodologies in identifying Instinctive Gestures leveraging Neural Networks and Machine Learning, with Method 3 showing the highest accuracy due to its effective recoil filtering.
If you have suggestions or improvements for this project, feel free to open an issue or submit a pull request. Contributions are always welcome!
This project is licensed under the MIT License - see the LICENSE file for details.