Skip to content

axs03/Vision-Transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vision Transformer Model Training Examples

Unknown

  • Vision Transformers (ViTs) use the transformer architecture, originally designed for natural language processing, to process image data. An image is divided into fixed-size patches, which are flattened and linearly embedded into vectors.
  • These vectors, along with positional encodings, are passed through multiple transformer layers. Each layer uses self-attention mechanisms to capture relationships between patches and feed-forward networks to process the features. The output of the final layer is passed to a classification head for prediction.
  • ViTs excel at capturing global context in images due to their ability to focus on relevant regions across the entire input.

These are the datasets used in this Vision-Transformer Model for binary image classification:

About

Training a Vision Transformer Image Classification model on example datasets

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages