Vision-Transformer/README.md at main · codamin/Vision-Transformer · GitHub

15 lines (12 loc) · 505 Bytes

This notebook is for assignment 1 of the CS-503 Visual Intelligence course at EPFL by Prof. Amir Zamir.

The goals of this assignment are to:

Implement a Vision Transformer for MNIST classification
Implement a GPT decoder model for image generation

Topics covered in this assignment:

Self-attention
Basic tokenization
Basic positional encodings
Transformer encoder-only (e.g. ViT) and decoder-only (e.g. GPT) models
Vision Transformer (ViT)
Supervised training
Autoregressive modelling