Feature/compute chrf metric by tanhaow · Pull Request #27 · Princeton-CDH/muse

tanhaow · 2026-02-16T19:50:18Z

Associated Issue(s): resolves #14

Changes in this PR

Added compute_chrf() function in src/muse/evaluation/metrics.py to compute ChrF scores for machine translations
Created new evaluation module for MT evaluation metrics
Added HuggingFace dependencies: evaluate, datasets, and sacrebleu to pyproject.toml
Added test script test_scripts/test_compute_chrf.py

Notes

The compute_chrf() function takes two parameters: tr_text (machine translation) and ref_text (human reference), and returns a float score in range [0, 100].

Reviewer Checklist

Download test data (test_data_mt_tencent.jsonl) from Google Drive and run test script
Verify the function signature matches the issue requirements (tr_text, ref_text inputs; float output)

laurejt · 2026-02-17T20:00:36Z

Note: a developer should never have to explicitly run uv sync since uv will run this on the own if the pyproject.toml has changed.

laurejt · 2026-02-17T20:01:26Z

@tanhaow What is the expected output for your test script given the test data?

The test script ran, but I'm not sure what I'm supposed to conclude from that.

laurejt

Overall, this looks pretty good but the code/files need to be reorganized as outlined in my comments below.

laurejt · 2026-02-17T19:21:35Z

src/muse/evaluation/metrics.py

Move the method of interest into the new file metrics.py within the existing translation module. This new file should contain all machine translation metrics we create (similar to what was done for generating translations).

laurejt · 2026-02-17T19:29:17Z

src/muse/metrics/__init__.py

Delete this. It should not be needed. Instead, the metrics submodule should have been added to the top-level __init__.py (i.e., src/muse/__init__.py), but don't do this since this submodule is being removed.

laurejt · 2026-02-17T19:44:04Z

test_scripts/test_compute_chrf.py

Update your test script so that it will work with the updated code organization.

laurejt · 2026-02-17T19:45:36Z

test_scripts/test_compute_chrf.py

@@ -0,0 +1,58 @@
+#!/usr/bin/env python3


Why are you including this shebang? Do you running this as an executable?

laurejt · 2026-02-17T19:54:12Z

src/muse/metrics/chrf.py

+    if not tr_text or not tr_text.strip():
+        raise ValueError("Translation text cannot be empty")
+    if not ref_text or not ref_text.strip():
+        raise ValueError("Reference text cannot be empty")


Remove these checks, they are unnecessary.

laurejt · 2026-02-17T19:57:14Z

test_scripts/test_compute_chrf.py

+import sys
+from pathlib import Path
+
+from muse.metrics import compute_chrf


This line should look something like this:

Suggested change

from muse.metrics import compute_chrf

from muse.translation.metrics import compute_chrf

@laurejt

- Move compute_chrf() from src/muse/metrics/chrf.py to src/muse/translation/metrics.py - Remove src/muse/metrics/ submodule entirely - Remove unnecessary empty string validation checks - Update test script import path - Remove shebang from test script Addresses review comments from @laurejt in PR #27 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Move compute_chrf() from src/muse/metrics/chrf.py to src/muse/translation/metrics.py - Remove src/muse/metrics/ submodule entirely

tanhaow · 2026-02-17T20:56:42Z

@tanhaow What is the expected output for your test script given the test data?

The test script ran, but I'm not sure what I'm supposed to conclude from that.

The test script is just there to verify the compute_chrf() function works correctly. It's not meant to produce any meaningful analysis or conclusions. Initially, we had test scripts like that because Rebecca and I decided we do not need a unit test at the current stage, but to have a test script would be easier to test the code locally. But also feel free to remove it if you find it unnecessary.

What's the output you got?

- Created new src/muse/evaluation module for MT evaluation metrics - Moved compute_chrf() from src/muse/translation/metrics.py to src/muse/evaluation/metrics.py - Updated test script import path - Added evaluation to module exports This separates evaluation concerns from translation generation, allowing for future expansion with other metrics (COMET, BLEU, etc.)

tanhaow · 2026-02-19T20:17:16Z

Thanks for your review, @laurejt!

Changes: Created new evaluation module for MT evaluation metrics and moved compute_chrf() from src/muse/translation/metrics.py to src/muse/evaluation/metrics.py based on the whiteboard session yesterday. So the evaluation module is now separate from translation, which allows for future expansion with other metrics like COMET and BLEU.

laurejt

This can be merged, but the following two changes must be made:

Remove src/muse/evaluation/__init__.py
Update the pyproject.toml to be as general with dependencies as possible. Only provide versioning requirements when necessary.

laurejt · 2026-02-19T21:27:09Z

pyproject.toml

+  "evaluate>=0.4.0",
+  "datasets>=2.0.0", # required by evaluate
+  "sacrebleu>=2.0.0", # required by evaluate for ChrF metric


Why is the versioning to the patch level specified? If this is necessary write a comment explaining why.

laurejt · 2026-02-19T21:27:22Z

src/muse/evaluation/__init__.py

Delete this unnecessary file.

tanhaow added 2 commits February 16, 2026 13:56

Add ChrF metric computation for machine translation evaluation

630e2c6

Add test script

d23a35a

tanhaow requested a review from laurejt February 16, 2026 19:50

tanhaow self-assigned this Feb 16, 2026

tanhaow changed the base branch from main to develop February 16, 2026 19:50

laurejt requested changes Feb 17, 2026

View reviewed changes

tanhaow force-pushed the feature/compute-ChrF-metric branch from 499239e to d23a35a Compare February 17, 2026 20:24

Reorganize ChrF metric into translation module per review

56f6bbe

- Move compute_chrf() from src/muse/metrics/chrf.py to src/muse/translation/metrics.py - Remove src/muse/metrics/ submodule entirely

tanhaow added 3 commits February 17, 2026 15:59

use orjsonl

8700958

Merge develop branch into feature/compute-ChrF-metric

a3e03f2

Update docstrings and commments

940b347

tanhaow requested a review from laurejt February 19, 2026 20:22

laurejt approved these changes Feb 19, 2026

View reviewed changes

tanhaow added 2 commits February 19, 2026 16:49

Delete __init__.py

918c8c5

unpin version

4362955

tanhaow merged commit 43f20e3 into develop Feb 19, 2026
1 check passed

tanhaow deleted the feature/compute-ChrF-metric branch February 19, 2026 21:53

	from muse.metrics import compute_chrf
	from muse.translation.metrics import compute_chrf

Conversation

tanhaow commented Feb 16, 2026 • edited by laurejt Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes in this PR

Notes

Reviewer Checklist

Uh oh!

laurejt commented Feb 17, 2026

Uh oh!

laurejt commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

laurejt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tanhaow commented Feb 17, 2026

Uh oh!

tanhaow commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

laurejt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tanhaow commented Feb 16, 2026 •

edited by laurejt

Loading

laurejt commented Feb 17, 2026 •

edited

Loading

tanhaow commented Feb 19, 2026 •

edited

Loading