Skip to content

Resume translation and max generation logic#36

Merged
laurejt merged 5 commits intodevelopfrom
feature/resume-translation
Feb 24, 2026
Merged

Resume translation and max generation logic#36
laurejt merged 5 commits intodevelopfrom
feature/resume-translation

Conversation

@laurejt
Copy link

@laurejt laurejt commented Feb 24, 2026

Associated Issue(s): N/A

Changes in this PR

  • Updates translate.py so that HuggingFace translate functions all set the max_new_tokens field for generation to double the input length.
  • Adds resume logic for translate_corpus.py so that translation can be continued if the script is disrupted (e.g., fail to connect to the Google Cloud server)

Reviewer Checklist

  • Confirm that hymt, nllb, and madlad translation worked as expected
  • Check that translate_corpus.py runs successfully without resuming
  • Check that translate_corpus.py with resume runs successfully. Confirm that existing translations are skipped and the rest are translated

Removing the option restraining output length appears to fix
translation errors where the model did not appear to complete
its generation but instead repeatedly output the same ending
phrase.
@laurejt laurejt requested a review from tanhaow February 24, 2026 15:50
@laurejt laurejt self-assigned this Feb 24, 2026
@laurejt laurejt added the 👇this sprint Add Issue to ZenHub label Feb 24, 2026
Copy link

@tanhaow tanhaow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding the new feature 🚀

@laurejt laurejt merged commit 2b96cc7 into develop Feb 24, 2026
1 check passed
@laurejt laurejt deleted the feature/resume-translation branch February 24, 2026 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

👇this sprint Add Issue to ZenHub

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants