This repository contains a from-scratch implementation of several fundamental components of the Transformer architecture in Deep Learning. The goal is to understand and recreate the core mechanisms described in the groundbreaking paper "Attention is All You Need" by Vaswani et al.
The main components implemented in this project are:
- Self-Attention: Computing attention weights between words in a sequence.
- Positional Encoding: Adding positional information to preserve word order in sequences.
- Encoder: Encoding block comprising attention, feed-forward layers, and residual connections.
- Decoder: Decoding block similar to the encoder with an additional attention mechanism over the encoder's output.
The image above illustrates the full Transformer architecture as described in the paper "Attention is All You Need".
- Vaswani, A., et al. (2017). Attention Is All You Need. Link to the paper
