Skip to content

Feature/compute chrf metric#27

Merged
tanhaow merged 9 commits intodevelopfrom
feature/compute-ChrF-metric
Feb 19, 2026
Merged

Feature/compute chrf metric#27
tanhaow merged 9 commits intodevelopfrom
feature/compute-ChrF-metric

Conversation

@tanhaow
Copy link

@tanhaow tanhaow commented Feb 16, 2026

Associated Issue(s): resolves #14

Changes in this PR

  • Added compute_chrf() function in src/muse/evaluation/metrics.py to compute ChrF scores for machine translations
  • Created new evaluation module for MT evaluation metrics
  • Added HuggingFace dependencies: evaluate, datasets, and sacrebleu to pyproject.toml
  • Added test script test_scripts/test_compute_chrf.py

Notes

  • The compute_chrf() function takes two parameters: tr_text (machine translation) and ref_text (human reference), and returns a float score in range [0, 100].

Reviewer Checklist

  • Download test data (test_data_mt_tencent.jsonl) from Google Drive and run test script
  • Verify the function signature matches the issue requirements (tr_text, ref_text inputs; float output)

@tanhaow tanhaow requested a review from laurejt February 16, 2026 19:50
@tanhaow tanhaow self-assigned this Feb 16, 2026
@tanhaow tanhaow changed the base branch from main to develop February 16, 2026 19:50
@laurejt
Copy link

laurejt commented Feb 17, 2026

Note: a developer should never have to explicitly run uv sync since uv will run this on the own if the pyproject.toml has changed.

@laurejt
Copy link

laurejt commented Feb 17, 2026

@tanhaow What is the expected output for your test script given the test data?

The test script ran, but I'm not sure what I'm supposed to conclude from that.

Copy link

@laurejt laurejt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks pretty good but the code/files need to be reorganized as outlined in my comments below.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the method of interest into the new file metrics.py within the existing translation module. This new file should contain all machine translation metrics we create (similar to what was done for generating translations).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this. It should not be needed. Instead, the metrics submodule should have been added to the top-level __init__.py (i.e., src/muse/__init__.py), but don't do this since this submodule is being removed.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update your test script so that it will work with the updated code organization.

@@ -0,0 +1,58 @@
#!/usr/bin/env python3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you including this shebang? Do you running this as an executable?

Comment on lines +28 to +31
if not tr_text or not tr_text.strip():
raise ValueError("Translation text cannot be empty")
if not ref_text or not ref_text.strip():
raise ValueError("Reference text cannot be empty")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these checks, they are unnecessary.

import sys
from pathlib import Path

from muse.metrics import compute_chrf
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should look something like this:

Suggested change
from muse.metrics import compute_chrf
from muse.translation.metrics import compute_chrf

tanhaow added a commit that referenced this pull request Feb 17, 2026
- Move compute_chrf() from src/muse/metrics/chrf.py to src/muse/translation/metrics.py
- Remove src/muse/metrics/ submodule entirely
- Remove unnecessary empty string validation checks
- Update test script import path
- Remove shebang from test script

Addresses review comments from @laurejt in PR #27

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@tanhaow tanhaow force-pushed the feature/compute-ChrF-metric branch from 499239e to d23a35a Compare February 17, 2026 20:24
- Move compute_chrf() from src/muse/metrics/chrf.py to src/muse/translation/metrics.py
- Remove src/muse/metrics/ submodule entirely
@tanhaow
Copy link
Author

tanhaow commented Feb 17, 2026

@tanhaow What is the expected output for your test script given the test data?

The test script ran, but I'm not sure what I'm supposed to conclude from that.

The test script is just there to verify the compute_chrf() function works correctly. It's not meant to produce any meaningful analysis or conclusions. Initially, we had test scripts like that because Rebecca and I decided we do not need a unit test at the current stage, but to have a test script would be easier to test the code locally. But also feel free to remove it if you find it unnecessary.

What's the output you got?

- Created new src/muse/evaluation module for MT evaluation metrics
- Moved compute_chrf() from src/muse/translation/metrics.py to src/muse/evaluation/metrics.py
- Updated test script import path
- Added evaluation to module exports

This separates evaluation concerns from translation generation, allowing for future expansion with other metrics (COMET, BLEU, etc.)
@tanhaow
Copy link
Author

tanhaow commented Feb 19, 2026

Thanks for your review, @laurejt!

Changes: Created new evaluation module for MT evaluation metrics and moved compute_chrf() from src/muse/translation/metrics.py to src/muse/evaluation/metrics.py based on the whiteboard session yesterday. So the evaluation module is now separate from translation, which allows for future expansion with other metrics like COMET and BLEU.

@tanhaow tanhaow requested a review from laurejt February 19, 2026 20:22
Copy link

@laurejt laurejt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be merged, but the following two changes must be made:

  1. Remove src/muse/evaluation/__init__.py
  2. Update the pyproject.toml to be as general with dependencies as possible. Only provide versioning requirements when necessary.

pyproject.toml Outdated
Comment on lines +34 to +36
"evaluate>=0.4.0",
"datasets>=2.0.0", # required by evaluate
"sacrebleu>=2.0.0", # required by evaluate for ChrF metric
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the versioning to the patch level specified? If this is necessary write a comment explaining why.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this unnecessary file.

@tanhaow tanhaow merged commit 43f20e3 into develop Feb 19, 2026
1 check passed
@tanhaow tanhaow deleted the feature/compute-ChrF-metric branch February 19, 2026 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants