Skip to content

Latest commit

 

History

History
15 lines (12 loc) · 505 Bytes

File metadata and controls

15 lines (12 loc) · 505 Bytes

This notebook is for assignment 1 of the CS-503 Visual Intelligence course at EPFL by Prof. Amir Zamir.

The goals of this assignment are to:

  • Implement a Vision Transformer for MNIST classification
  • Implement a GPT decoder model for image generation

Topics covered in this assignment:

  • Self-attention
  • Basic tokenization
  • Basic positional encodings
  • Transformer encoder-only (e.g. ViT) and decoder-only (e.g. GPT) models
  • Vision Transformer (ViT)
  • Supervised training
  • Autoregressive modelling