This repository contains code for the paper, On the Effectiveness of Offline RL for Dialogue Response Generation, presented at ICML 2023.
git clone git@github.com:asappresearch/dialogue-offline-rl.git
cd dialogue-offline-rl
pyenv virtualenv dialogue-offline-rl
pyenv activate dialogue-offline-rlInstall the required packages:
pip install -r requirements.txt| Model | Links |
|---|---|
Base model (tf) |
ABCD, MultiWoz-2.2, TaskMaster-3 |
Fine Tune on Top Returns (tf_top) |
ABCD, MultiWoz-2.2, TaskMaster-3 |
Decision Transformers: Condition on Return (dt) |
ABCD, MultiWoz-2.2, TaskMaster-3 |
Off-policy Q-learning (ilql) |
ABCD, MultiWoz-2.2, TaskMaster-3 |
Download and create datasets for training the base TF model:
for dataset in abcd multi_woz taskmaster3; do
bash scripts/process_data/download_process_${dataset}.sh
doneTrain the base TF model by executing:
bash scripts/train/train_base_tf_distilgpt2.sh {dataset} {ngpu}for example, bash scripts/train/train_base_tf_model_distilgpt2.sh abcd 4
To generate datasets for all three methods (tf_top, dt, ilql), we need the path to the base TF model (model_path):
for split in train val test; do
python scripts/process_data/prepare_offline_rl_data.py --model_path_tf {model_path_tf} --save_path {save_path} --split ${split}
doneFor training, we provide scripts for each of the three methods (tf_top, dt, ilql):
bash scripts/train/train_offline_rl_distilgpt2.sh tf_top {dataset} {ngpu}for example, bash scripts/train/train_offline_rl_distilgpt2.sh tf_top abcd 4
bash scripts/train/train_offline_rl_distilgpt2.sh dt {dataset} {ngpu}for example, bash scripts/train/train_offline_rl_distilgpt2.sh dt abcd 4
First, install trlx from a fork at this location. Then execute the command:
python scripts/training/run_trlx_ilql.py --config_path config/trlx_ilql_gpt2med.yml --data_path {ilql_data_path}To evaluate all the models:
python scripts/evaluation/evaluate_reward_metrics.py --dataset {dataset} --method {method} --model_path {model_path} --metrics '["bert_score", "bleurt_score", "meteor", "bleu"]' --save_path {save_path} --num_samples 1000where, method={tf, tf_top, dt, ilql}, dataset={abcd, multi_woz, taskmaster3}, and model_path is the path to corresponding model. The script will save all the predictions and metrics to a .csv at save_path.
If you found our code or paper useful, please consider citing:
@inproceedings{sodhi2023offlinerl,
title={On the Effectiveness of Offline RL for Dialogue Response Generation},
author={Sodhi, Paloma and Wu, Felix and Elenberg, Ethan R and Weinberger, Kilian Q and McDonald, Ryan},
booktitle = {International Conference on Machine Learning (ICML)},
year={2023}
}This project is licensed under the terms of the MIT license.