A custom Transformer Encoder-Decoder implementation in PyTorch designed to generate quotes based on specific styles or categories. By providing a style (e.g., love, life, inspirational) to the encoder, the model learns to generate contextually relevant and stylistically consistent quotes.
The model is trained on the Quotes-500k Dataset from Kaggle.
- Columns:
quoteauthorcategory- multiple tags
- Data Processing:
- Trained on a randomized subset of ~200,000 entries.
- Targeted Styles: The model uses the first tag from the
categorycolumn as the source input. - Quality Control: Extremely long or short quotes were excluded during preprocessing to maintain structural consistency and prevent padding-related noise.
- Custom Transformer: Built from scratch based on the "Attention Is All You Need" paper.
- Optimized Architecture: Tuned to 256 model dimensions to balance creativity and prevent memorization (overfitting).
- Advanced Sampling: Supports Top-K and Temperature sampling for diverse and natural text generation.
- Automated Evaluation: Integrated Perplexity, BLEU and METEOR metrics for performance tracking.
- TensorBoard Integration: Real-time monitoring of Training and Validation Loss and Evaluations.
Through rigorous testing, the following configuration was selected to optimize learning stability:
| Parameter | Value |
|---|---|
| Model Dimension ( |
256 |
| Heads ( |
8 |
| Layers ( |
6 Encoder / 6 Decoder |
| Context Size | 96 tokens |
| Dropout | 0.3 |
| Label Smoothing | 0.15 |
| Batch Size | 128 - 256 |
Initially, a larger model (
Solution:
- Reduced model capacity to 256 dimensions.
- Increased Dropout to 0.3.
- Implemented a Learning Rate Scheduler (
ReduceLROnPlateau) to handle the steep loss gradients seen in later epochs.
- Essential Environment Fixes
!pip install -q "protobuf==3.20.3" --force-reinstall
!pip install -q evaluate datasets- Repo Management
!git clone https://github.com/aannjjiiccaa/transformer.git
%cd transformer- Visualization & Training
%load_ext tensorboard
%tensorboard --logdir runs/quotes
!python train.pyAfter training the model on Kaggle, follow these steps to run it locally:
-
Download Model Assets
Move the following files from your Kaggle output to your local project directory:
best_model.pt-> place in weights/tokenizer_src.jsonandtokenizer_tgt.json-> place in the root folder.
-
Enviroment Setup
Create a virtual environment to keep your dependencies isolated:
- Create venv
python -m venv venv
- Activate venv
- Windows:
.\venv\Scripts\activate - Linux/macOS:
source venv/bin/activate
- Windows:
-
Install Requirements
Install the necessary libraries for inference:
pip install torch tokenizers
-
Run the Generator
Start the interactive inference session:
python inference.py
- Categories: The model was trained on specific tags (e.g. love, life, inspirational). Try those first for the best results.
- Exit: Simply type exit to close the program.
- Performance: The script is optimized to load the weights once, so subsequent generations are near-instant.
-
model.py: Core Transformer architecture (Attention, MultiHead, Encoder/Decoder).
-
dataset.py: Data loading and custom Tokenizer logic.
-
train.py: Training loop with validation and checkpointing.
-
config.py: Centralized hyperparameters and path management.
-
test.py: Inference engine with Top-K sampling logic.
-
inference.py: An interactive CLI application that allows users to generate quotes by category.