This directory contains code and data used to generate different rankings of machine translation systems from human assessment data, as described in:
The data are derived from several past editions of the Workshop on Machine Translation organized by the ACL Special Interest Group on Machine Translation. Note that only data from the 2010 and 2011 editions were used for the paper, although data from the last five workshops are included.
Run the command generate_rankings.sh. This will extract pairwise comparisons from the raw data and run the various ranking algorithms. Most of the code is either in simple bash or python scripts.
-
raw_data contains the raw assessment data from five incarnations of the workshop, obtained from these public URLs:
-
bin contains utility scripts in python and bash to extract pairwise rankings from the raw data, compute rankings from tournaments, compute the cost of a feedback arc sets, and compute Spearman's rho.
-
data contains rankings and intermediate data produced by the scripts. This directory is produced by the top-level script generate_rankings.sh