add method to compute comet metric and a test script for it#51
add method to compute comet metric and a test script for it#51
Conversation
|
@tanhaow Is there specific sample data I should be using for the test script? Add it to the test dataset project drive if you haven't yet. |
I didn’t create a specific one. You could just use a portion of the data we already have. |
laurejt
left a comment
There was a problem hiding this comment.
It's good enough for now. If there's a way to reduce all of the generated warnings/logging from HuggingFace that'd be nice, it's a lot of text.
In the future, we should probably look into ways to avoid repeatedly loading the same HuggingFace models over and over again for COMET, but that can be a future task. It may be quite important for getting #16.
|
|
||
| This module provides functions for computing various MT evaluation metrics | ||
| including ChrF, and potentially COMET, BLEU, and others in the future. | ||
| including ChrF, COMET, and potentially BLEU and others in the future. |
There was a problem hiding this comment.
While we're editing this file, let's get rid of the mention of BLEU since it's a notoriously bad evaluation metric for machine translation.
| including ChrF, COMET, and potentially BLEU and others in the future. | |
| including ChrF, COMET, and potentially others in the future. |
Associated Issue(s): resolves #15
Changes in this PR
compute_comet()function for computing COMET scores using HuggingFace's evaluate librarytest_scripts/test_compute_comet.pytest scriptNotes
tr_text(translation),src_text(source), andref_text(reference translation)~/.cache/huggingface/hub/Reviewer Checklist