ViT-LSTM-Foot-Contact-Detection

Computer vision-based foot contact detection for long jump using a monocular normal-speed camera

Yangtao Fang, Qi Gan, Sao Mai Nguyen

IP Paris, Telecom Paris, ENSTA

The project proposes a hybrid Vision Transformer (ViT) and Bidirectional LSTM (BiLSTM) model with an attention-based fusion mechanism to accurately classify the degree of foot-ground contact during the long jump, using video captured at only 25 frames per second.

Highlights: Achieved 91.87% classification accuracy and 8.18 ms/frame processing speed on a resource-constrained GPU (8G VRAM, 321 TOPS).

Features

Low-Frame-Rate Analysis: Designed to work effectively with videos captured at standard frame rates, overcoming motion blur issues.
Fine-Grained Classification: Classifies foot contact into 5 distinct labels (0: No Contact, 1-3: Progressive Ground Contact Stages, 4: Sandpit Contact), offering more detail than binary classification.
Hybrid ViT-LSTM Architecture: Combines the spatial feature extraction power of Vision Transformers with the temporal modeling capabilities of LSTMs.
Attention-Based Fusion: Fuses visual features (from cropped ankle images) and 2D pose data using an attention mechanism to focus on relevant information.
Efficient Processing: Optimized for performance, achieving fast processing speeds even on hardware with limited computational resources.
Robust Training Strategy: Employs pose normalization, data augmentation, 5-fold cross-validation, and weighted cross-entropy loss to handle data limitations and class imbalance.

Method

Distribution of 19 joint points:

Label Definition:

Model Architecture: $B$: batch size, $T$: sequence length, $C$: number of channels, $H$: frame height, $W$: frame width, $F_p$: pose feature dimension, $F_v$: ViT feature dimension, $D_h$: hidden size (per direction), $N$: output classes.

Installation

Environment: This project was developed using Python 3.10.16 and Pytorch 2.6.0+cu118. You can choose the version of PyTorch that suits your GPU driver.

Clone the repository:

    git clone https://github.com/fangevo/ViT-LSTM-Foot-Contact-Detection.git
    cd ViT-LSTM-Foot-Contact-Detection

Dataset (Frame sequences extracted from 30 video clips): https://drive.google.com/file/d/13hf_kXzegg2eVV8V31Rg6dn1gqT6wMtb/view?usp=sharing
Put the data folder in ./

Weight: https://drive.google.com/file/d/1fAFRAi2CZWLprRo158a0964dfXdIXCnX/view?usp=drive_link
Put the model weight file in ./weight/

Train:

    python main.py --mode train

Prediction:

    python main.py --mode predict

Some useful tools: The scripts in the utilis folder include a visual annotation tool, confusion matrix computation, ankle image cropping, and pose normalization. Using these scripts requires manually modifying the file paths.

Citation

@misc{fang:hal-05090038,
  TITLE = {{Computer vision-based foot contact detection for long jump using a monocular normal-speed camera}},
  AUTHOR = {Fang, Yangtao and Gan, Qi and Nguyen, Sao Mai},
  URL = {https://hal.science/hal-05090038},
  NOTE = {Poster},
  HOWPUBLISHED = {{Journ{\'e}e commune EGC/AFIA Gestion et Analyse de donn{\'e}es Sportives (GAS'25)}},
  ORGANIZATION = {{Nida Meddouri and Albrecht Zimmermann and Cl{\'e}ment Iphar and Aur{\'e}lie Leborgne and Lo{\"i}c Salmon}},
  YEAR = {2025},
  MONTH = May,
  PDF = {https://hal.science/hal-05090038v1/file/GAS%2725_GAST_Fang_et_al.pdf},
  HAL_ID = {hal-05090038},
  HAL_VERSION = {v1},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ViT-LSTM-Foot-Contact-Detection

Computer vision-based foot contact detection for long jump using a monocular normal-speed camera

Features

Method

Installation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
data		data
output		output
utilis		utilis
weight		weight
README.md		README.md
main.py		main.py

Folders and files

Latest commit

History

Repository files navigation

ViT-LSTM-Foot-Contact-Detection

Computer vision-based foot contact detection for long jump using a monocular normal-speed camera

Features

Method

Installation

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages