Skip to content

add method to compute comet metric and a test script for it#51

Open
tanhaow wants to merge 1 commit intodevelopfrom
feature/compute-comet-metric
Open

add method to compute comet metric and a test script for it#51
tanhaow wants to merge 1 commit intodevelopfrom
feature/compute-comet-metric

Conversation

@tanhaow
Copy link

@tanhaow tanhaow commented Mar 11, 2026

Associated Issue(s): resolves #15

Changes in this PR

  • Added compute_comet() function for computing COMET scores using HuggingFace's evaluate library
  • Created test_scripts/test_compute_comet.py test script

Notes

  • Automatically uses GPU acceleration (MPS on macOS, CUDA on Linux/Windows) when available
  • The function takes three parameters: tr_text (translation), src_text (source), and ref_text (reference translation)
  • Returns a float in the range [0, 1], where 0 indicates a poor translation and 1 indicates a perfect translation
  • First execution will download the COMET model (~500MB) to ~/.cache/huggingface/hub/

Reviewer Checklist

  • Review the code
  • Run test script with sample data

@tanhaow tanhaow requested a review from laurejt March 11, 2026 13:23
@tanhaow tanhaow self-assigned this Mar 11, 2026
@laurejt
Copy link

laurejt commented Mar 11, 2026

@tanhaow Is there specific sample data I should be using for the test script? Add it to the test dataset project drive if you haven't yet.

@tanhaow
Copy link
Author

tanhaow commented Mar 11, 2026

@tanhaow Is there specific sample data I should be using for the test script? Add it to the test dataset project drive if you haven't yet.

I didn’t create a specific one. You could just use a portion of the data we already have.

Copy link

@laurejt laurejt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good enough for now. If there's a way to reduce all of the generated warnings/logging from HuggingFace that'd be nice, it's a lot of text.

In the future, we should probably look into ways to avoid repeatedly loading the same HuggingFace models over and over again for COMET, but that can be a future task. It may be quite important for getting #16.


This module provides functions for computing various MT evaluation metrics
including ChrF, and potentially COMET, BLEU, and others in the future.
including ChrF, COMET, and potentially BLEU and others in the future.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're editing this file, let's get rid of the mention of BLEU since it's a notoriously bad evaluation metric for machine translation.

Suggested change
including ChrF, COMET, and potentially BLEU and others in the future.
including ChrF, COMET, and potentially others in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants