add method to compute comet metric and a test script for it by tanhaow · Pull Request #51 · Princeton-CDH/muse

tanhaow · 2026-03-11T13:23:37Z

Associated Issue(s): resolves #15

Changes in this PR

Added compute_comet() function for computing COMET scores using HuggingFace's evaluate library
Created test_scripts/test_compute_comet.py test script

Notes

Automatically uses GPU acceleration (MPS on macOS, CUDA on Linux/Windows) when available
The function takes three parameters: tr_text (translation), src_text (source), and ref_text (reference translation)
Returns a float in the range [0, 1], where 0 indicates a poor translation and 1 indicates a perfect translation
First execution will download the COMET model (~500MB) to ~/.cache/huggingface/hub/

Reviewer Checklist

Review the code
Run test script with sample data

laurejt · 2026-03-11T14:11:53Z

@tanhaow Is there specific sample data I should be using for the test script? Add it to the test dataset project drive if you haven't yet.

tanhaow · 2026-03-11T16:46:41Z

@tanhaow Is there specific sample data I should be using for the test script? Add it to the test dataset project drive if you haven't yet.

I didn’t create a specific one. You could just use a portion of the data we already have.

laurejt

It's good enough for now. If there's a way to reduce all of the generated warnings/logging from HuggingFace that'd be nice, it's a lot of text.

In the future, we should probably look into ways to avoid repeatedly loading the same HuggingFace models over and over again for COMET, but that can be a future task. It may be quite important for getting #16.

laurejt · 2026-03-11T14:06:22Z

src/muse/evaluation/metrics.py


 This module provides functions for computing various MT evaluation metrics
-including ChrF, and potentially COMET, BLEU, and others in the future.
+including ChrF, COMET, and potentially BLEU and others in the future.


While we're editing this file, let's get rid of the mention of BLEU since it's a notoriously bad evaluation metric for machine translation.

Suggested change

including ChrF, COMET, and potentially BLEU and others in the future.

including ChrF, COMET, and potentially others in the future.

add method to compute comet metric and a test script for it

b28fc9b

tanhaow requested a review from laurejt March 11, 2026 13:23

tanhaow self-assigned this Mar 11, 2026

laurejt approved these changes Mar 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add method to compute comet metric and a test script for it#51

add method to compute comet metric and a test script for it#51
tanhaow wants to merge 1 commit intodevelopfrom
feature/compute-comet-metric

tanhaow commented Mar 11, 2026

Uh oh!

laurejt commented Mar 11, 2026

Uh oh!

tanhaow commented Mar 11, 2026

Uh oh!

laurejt left a comment

Uh oh!

laurejt Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	including ChrF, COMET, and potentially BLEU and others in the future.
	including ChrF, COMET, and potentially others in the future.

Conversation

tanhaow commented Mar 11, 2026

Changes in this PR

Notes

Reviewer Checklist

Uh oh!

laurejt commented Mar 11, 2026

Uh oh!

tanhaow commented Mar 11, 2026

Uh oh!

laurejt left a comment

Choose a reason for hiding this comment

Uh oh!

laurejt Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants