Skip to content

sidd707/sign-language-lstm-recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sign Language LSTM Recognition

Real-time sign language gesture recognition using MediaPipe hand landmarks and an LSTM neural network with live webcam prediction.

Problem Statement

People with hearing impairments face significant communication barriers with those who don't understand sign language. Traditional solutions require human interpreters, which aren't always available. This project builds a system that recognizes hand gestures in real-time through a webcam and translates them into text — making communication more accessible without requiring an intermediary.

How It Works

The system operates in two phases:

Phase 1: Data Collection & Training

  1. Capture webcam video using OpenCV
  2. Extract hand landmarks (left + right) using MediaPipe Holistic
  3. Record 30 sequences of 30 frames per gesture → stored as .npy keypoint arrays
  4. Train an LSTM model on the temporal sequences to learn gesture patterns

Phase 2: Live Prediction

  1. Stream webcam feed in real-time
  2. Extract hand keypoints per frame using MediaPipe
  3. Buffer last 30 frames into a sliding window
  4. Predict the gesture using the trained LSTM model
  5. Display predicted text and probability bars on screen

Pipeline

Webcam Feed
    │
    ▼
MediaPipe Holistic
(Hand Landmark Detection)
    │
    ├── Left Hand: 21 landmarks × 3 (x, y, z) = 63 features
    └── Right Hand: 21 landmarks × 3 (x, y, z) = 63 features
    │
    ▼
126 Keypoints per Frame
    │
    ▼
Sliding Window (30 frames)
    │
    ▼
┌─────────────────────────┐
│     LSTM Network        │
│  LSTM(64) → LSTM(128)   │
│  LSTM(64) → Dense(64)   │
│  Dense(32) → Softmax    │
└─────────────────────────┘
    │
    ▼
Predicted Gesture + Confidence
(displayed on video feed)

Model Architecture

Input (30 timesteps × 126 features)
    │
    ├── LSTM(64, return_sequences=True)     │  48,896 params
    ├── LSTM(128, return_sequences=True)    │  98,816 params
    ├── LSTM(64, return_sequences=False)    │  49,408 params
    ├── Dense(64, ReLU)                     │   4,160 params
    ├── Dense(32, ReLU)                     │   2,080 params
    └── Dense(N, Softmax)                   │      99 params

Total trainable parameters: 203,459
Parameter Value
Optimizer Adam
Loss Categorical Crossentropy
Epochs 200
Input Shape (30, 126) — 30 frames, 126 keypoints
Prediction Threshold 0.8 confidence
Checkpoint Best model saved via ModelCheckpoint

Feature Extraction

MediaPipe Holistic detects hand landmarks in each frame. Only hand connections are used (pose and face landmarks are excluded for efficiency):

Hand Landmarks Features (x, y, z)
Left Hand 21 63
Right Hand 21 63
Total 42 126 per frame

If a hand is not detected in a frame, the keypoints default to zeros — making the model robust to single-hand gestures.

Project Structure

├── Training.ipynb          # Data collection + model training
├── Testing.ipynb           # Load model + live webcam prediction
├── LICENSE                 # MIT License
└── README.md

Gestures Supported

The system is designed to be easily extensible. Example gesture sets used:

  • Letters: a, b, c, d, e, f
  • Words: food, water, help

To add new gestures, update the actions array and run the data collection cells in Training.ipynb.

Tech Stack

  • Python 3.11
  • MediaPipe — hand landmark detection via Holistic model
  • OpenCV — webcam capture and video display
  • TensorFlow / Keras — LSTM model building and training
  • NumPy — keypoint array operations
  • scikit-learn — train/test split, confusion matrix, accuracy score

Getting Started

git clone https://github.com/sidd707/sign-language-lstm-recognition.git
cd sign-language-lstm-recognition
pip install mediapipe opencv-python tensorflow numpy scikit-learn matplotlib

Collect Data & Train

jupyter notebook Training.ipynb
# Run cells sequentially — webcam will open for data collection

Run Live Prediction

jupyter notebook Testing.ipynb
# Loads trained weights and starts real-time recognition

Note: A webcam is required for both data collection and live prediction.

License

This project is licensed under the MIT License — see the LICENSE file for details.

About

Real-time sign language gesture recognition using MediaPipe hand landmarks and LSTM neural network with live webcam prediction.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors