Sign Language LSTM Recognition

Real-time sign language gesture recognition using MediaPipe hand landmarks and an LSTM neural network with live webcam prediction.

Problem Statement

People with hearing impairments face significant communication barriers with those who don't understand sign language. Traditional solutions require human interpreters, which aren't always available. This project builds a system that recognizes hand gestures in real-time through a webcam and translates them into text — making communication more accessible without requiring an intermediary.

How It Works

The system operates in two phases:

Phase 1: Data Collection & Training

Capture webcam video using OpenCV
Extract hand landmarks (left + right) using MediaPipe Holistic
Record 30 sequences of 30 frames per gesture → stored as .npy keypoint arrays
Train an LSTM model on the temporal sequences to learn gesture patterns

Phase 2: Live Prediction

Stream webcam feed in real-time
Extract hand keypoints per frame using MediaPipe
Buffer last 30 frames into a sliding window
Predict the gesture using the trained LSTM model
Display predicted text and probability bars on screen

Pipeline

Webcam Feed
    │
    ▼
MediaPipe Holistic
(Hand Landmark Detection)
    │
    ├── Left Hand: 21 landmarks × 3 (x, y, z) = 63 features
    └── Right Hand: 21 landmarks × 3 (x, y, z) = 63 features
    │
    ▼
126 Keypoints per Frame
    │
    ▼
Sliding Window (30 frames)
    │
    ▼
┌─────────────────────────┐
│     LSTM Network        │
│  LSTM(64) → LSTM(128)   │
│  LSTM(64) → Dense(64)   │
│  Dense(32) → Softmax    │
└─────────────────────────┘
    │
    ▼
Predicted Gesture + Confidence
(displayed on video feed)

Model Architecture

Input (30 timesteps × 126 features)
    │
    ├── LSTM(64, return_sequences=True)     │  48,896 params
    ├── LSTM(128, return_sequences=True)    │  98,816 params
    ├── LSTM(64, return_sequences=False)    │  49,408 params
    ├── Dense(64, ReLU)                     │   4,160 params
    ├── Dense(32, ReLU)                     │   2,080 params
    └── Dense(N, Softmax)                   │      99 params

Total trainable parameters: 203,459

Parameter	Value
Optimizer	Adam
Loss	Categorical Crossentropy
Epochs	200
Input Shape	(30, 126) — 30 frames, 126 keypoints
Prediction Threshold	0.8 confidence
Checkpoint	Best model saved via ModelCheckpoint

Feature Extraction

MediaPipe Holistic detects hand landmarks in each frame. Only hand connections are used (pose and face landmarks are excluded for efficiency):

Hand	Landmarks	Features (x, y, z)
Left Hand	21	63
Right Hand	21	63
Total	42	126 per frame

If a hand is not detected in a frame, the keypoints default to zeros — making the model robust to single-hand gestures.

Project Structure

├── Training.ipynb          # Data collection + model training
├── Testing.ipynb           # Load model + live webcam prediction
├── LICENSE                 # MIT License
└── README.md

Gestures Supported

The system is designed to be easily extensible. Example gesture sets used:

Letters: a, b, c, d, e, f
Words: food, water, help

To add new gestures, update the actions array and run the data collection cells in Training.ipynb.

Tech Stack

Python 3.11
MediaPipe — hand landmark detection via Holistic model
OpenCV — webcam capture and video display
TensorFlow / Keras — LSTM model building and training
NumPy — keypoint array operations
scikit-learn — train/test split, confusion matrix, accuracy score

Getting Started

git clone https://github.com/sidd707/sign-language-lstm-recognition.git
cd sign-language-lstm-recognition
pip install mediapipe opencv-python tensorflow numpy scikit-learn matplotlib

Collect Data & Train

jupyter notebook Training.ipynb
# Run cells sequentially — webcam will open for data collection

Run Live Prediction

jupyter notebook Testing.ipynb
# Loads trained weights and starts real-time recognition

Note: A webcam is required for both data collection and live prediction.

License

This project is licensed under the MIT License — see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sign Language LSTM Recognition

Problem Statement

How It Works

Phase 1: Data Collection & Training

Phase 2: Live Prediction

Pipeline

Model Architecture

Feature Extraction

Project Structure

Gestures Supported

Tech Stack

Getting Started

Collect Data & Train

Run Live Prediction

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Testing.ipynb		Testing.ipynb
Training.ipynb		Training.ipynb

Folders and files

Latest commit

History

Repository files navigation

Sign Language LSTM Recognition

Problem Statement

How It Works

Phase 1: Data Collection & Training

Phase 2: Live Prediction

Pipeline

Model Architecture

Feature Extraction

Project Structure

Gestures Supported

Tech Stack

Getting Started

Collect Data & Train

Run Live Prediction

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages