A high-performance C++ implementation of the Transformer architecture from scratch, optimized for CPU computation.
TransformerCPP is a complete implementation of the Transformer model architecture described in the "Attention Is All You Need" paper. This project aims to provide an efficient C++ implementation without external dependencies on deep learning frameworks.
The core design principles of this project are:
-
Performance: The implementation is optimized for CPU execution with multi-threading support for computationally intensive operations.
-
Modularity: The codebase is organized in a modular way with clear separation between different components (tensor operations, layers, models).
-
Flexibility: The architecture supports both training and inference modes, with configurable parameters.
-
Minimal Dependencies: The implementation relies only on the C++ standard library, with no external dependencies on deep learning frameworks.
The project is organized into several main components:
- Custom tensor implementation with support for broadcasting, reshaping, and basic arithmetic operations
- Thread-pooled execution for performance-critical operations
- Automatic differentiation for backpropagation
- Linear layers with weights and biases
- Multi-head attention mechanism
- Position-wise feed-forward networks
- Layer normalization
- Dropout for regularization
- Embedding and positional encoding
- Encoder stack with self-attention
- Decoder stack with masked self-attention and encoder-decoder attention
- Full Transformer model combining encoder and decoder
- Character-level tokenization
- Batch processing and sequence handling
- DataLoader for training and inference
- Configuration parser for model hyperparameters
- Thread pool implementation for parallel execution
- Helper functions for various operations
- C++17 compatible compiler
- CMake (version 3.14 or higher)
# Clone the repository
git clone https://github.com/KrishM123/transformer.cpp.git
cd transformer.cpp
# Create build directory
mkdir build
cd build
# Configure and build
cmake ..
makeThe project can be run in two modes: training and inference.
Before running, you can modify the parameters in config.ini:
# Model mode
inference_mode = true # Set to false for training
load_existing_weights = true # Whether to load pre-trained weights
weights_filename = transformer_weights.bin
data_filename = ../data/tiny_shakespeare.txt
# Model architecture
embed_dim = 256 # Embedding dimension
max_sequence_length = 100 # Maximum sequence length
num_layers = 8 # Number of encoder/decoder layers
num_heads = 8 # Number of attention heads
ff_hidden_dim = 1024 # Feed-forward hidden dimension
dropout_rate = 0.1 # Dropout rate
pad_token_id = 0.0 # Padding token ID
# Training parameters
learning_rate = 0.0005 # Learning rate for Adam optimizer
num_epochs = 100 # Number of training epochs
batch_size = 16 # Batch size
input_seq_length = 10 # Input sequence length
decoder_seq_length = 10 # Decoder sequence length
# Inference parameters
max_generate_length = 100 # Maximum length to generate
initial_prompt = ROMEO: # Initial prompt for text generation
# Performance parameters
num_threads = 500 # Number of threads for parallel executionTo train the model:
- Set
inference_mode = falseinconfig.ini - Configure training parameters as needed
- Run the executable:
./neural_networkThe model will train on the specified dataset and save the weights to the specified file.
To generate text with a trained model:
- Set
inference_mode = trueinconfig.ini - Make sure
load_existing_weights = trueandweights_filenamepoints to a valid weights file - Configure the
initial_promptandmax_generate_lengthas desired - Run the executable:
./neural_networkThe model will load the weights and generate text based on the initial prompt.
The project includes a test suite for the tensor operations:
# Run the tensor tests
./test_tensor- The
num_threadsparameter inconfig.inicontrols parallel execution. For optimal performance, set this to a value appropriate for your hardware. - Multi-threading is applied to computationally intensive operations such as matrix multiplication, element-wise operations, and attention calculations.
- The implementation uses SIMD optimizations when compiled with appropriate flags.