Sequence to sequence models

install

to install, run pip install .

Translation using deep neural networks - RNN (part 1)

Code for the article Translation using deep neural networks (part 1) is here

Trained model weights, tokenizer, and datasets are here

Train

To train the model from scratch, run

MODEL_WEIGHTS_DIR="/path/to/output/model/weights"
SENTENCEPIECE_MODEL_DIR="/path/to/sentencepiece/model"  # tokenizer/30000 (from download link)
DATASETS_DIR="/path/to/wmt14/dataset" # datasets/wmt14_train (from download link)
./scripts/train.sh --with-attention

If you don't want to train the model with attention, exclude --with-attention

Inference

To evaluate the model on the WMT'14 test set, run

MODEL_WEIGHTS_PATH="/path/to/trained/model/weights"
SENTENCEPIECE_MODEL_DIR="/path/to/sentencepiece/model"  # tokenizer/30000 (from download link)
DATASETS_DIR="/path/to/wmt14/dataset" # datasets/wmt14_train (from download link)
EVAL_OUT_PATH="/path/to/output/eval"
./scripts/inference.sh --with-attention

Example

An example of using the model for inference is here

This uses an out of sample example from the article to test on!!

Translation using deep neural networks - Transformer (part 2)

encoder-decoder:

Trained model weights

Tokenizer

decoder-only (multitask loss)

Trained model weights

Tokenizer

Training data that has been pre-tokenized and stored as numpy arrays is here, generated via seq2seq_translation/tokenization/tokenize_and_write_to_disk.py

To train either model, run:

torchrun --standalone --nproc_per_node=3 -m seq2seq_translation.run --config_path <path to config>

Encoder-decoder config:

{
  "architecture_type": "transformer",
  "sentence_piece_model_dir": "<path to tokenizer>",
  "weights_out_dir": "<path to weights dir>",
  "num_layers": 6,
  "d_model": 512,
  "n_head": 8,
  "feedforward_hidden_dim": 2048,
  "dropout": 0.1,
  "n_epochs": 3,
  "batch_size": 128,
  "seed": 1234,
  "label_smoothing": 0.1,
  "source_lang": "en",
  "target_lang": "fr",
  "max_input_length": 128,
  "fixed_length": 128,
  "decoder_num_timesteps": 80,
  "train_frac": 0.999,
  "weight_decay": 0.1,
  "decay_learning_rate": true,
  "eval_iters": 70,
  "use_ddp": true,
  "use_wandb": true,
  "use_mixed_precision": true,
  "norm_first": false,
  "activation": "relu",
  "tokenizer_type": "sentencepiece"
}

Decoder-only config:

{
  "architecture_type": "transformer",
  "n_epochs": 2,
  "batch_size": 256,
  "seed": 1234,
  "label_smoothing": 0.1,
  "dropout": 0.1,
  "source_lang": "en",
  "target_lang": "fr",
  "train_frac": 0.999,
  "weight_decay": 0.0001,
  "decay_learning_rate": true,
  "loss_eval_interval": 2000,
  "accuracy_eval_interval": 30000,
  "eval_iters": 70,
  "use_ddp": true,
  "use_mixed_precision": true,
  "tokenizer_type": "sentencepiece",
  "d_model": 512,
  "num_layers": 19,
  "n_head": 8,
  "activation": "gelu",
  "norm_first": true,
  "feedforward_hidden_dim": 2048,
  "positional_encoding_type": "sinusoidal",
  "decoder_only": true,
  "sentence_piece_model_dir": "<path to tokenizer>",
  "decoder_num_timesteps": 80,
  "dtype": "float16",
  "loss_type": "autoencode_translation",
  "fixed_length": 260,
  "tokenized_dir": "<path to preprocessed tokens>",
  "weights_out_dir": "<weights out dir>"
}

To run inference, add:

(wmt14 test arrow file obtained via WMT14().download())

{
  "is_test": true,
  "evaluate_only": true,
  "dataset_path": "<path to wmt14 test arrow file>",
  "load_from_checkpoint_path": "<path to weights>",
  "eval_out_path": "<eval out path>"
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequence to sequence models

install

Translation using deep neural networks - RNN (part 1)

Train

Inference

Example

Translation using deep neural networks - Transformer (part 2)

encoder-decoder:

decoder-only (multitask loss)

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Sequence to sequence models

install

Translation using deep neural networks - RNN (part 1)

Train

Inference

Example

Translation using deep neural networks - Transformer (part 2)

encoder-decoder:

decoder-only (multitask loss)