Skip to content

Updates to translation pipeline#34

Merged
laurejt merged 4 commits intodevelopfrom
feature/update-translation
Feb 20, 2026
Merged

Updates to translation pipeline#34
laurejt merged 4 commits intodevelopfrom
feature/update-translation

Conversation

@laurejt
Copy link

@laurejt laurejt commented Feb 20, 2026

Associated Issue(s): None

Changes in this PR

  • Updated translate.py so that the correct / default HuggingFace models are used (earlier in development, we made the decision to work with the smaller sized hymt1.5 and madlad400 models.). Also updated the generic translate function so it uses model shorthand identifiers.
  • Updated translate_corpus.py so it fails early (i.e., fails / halts on the first error encountered). Also simplified tqdm logic.
  • Updated the logic for inferring the Google Cloud Project ID from the ADC file.
  • Updated the documentation for the Google Cloud setup in DEVELOPERNOTES.md setup to provide more details on the ADC file, especially when switching between projects.

Reviewer Checklist

  • Check that the updated Google Cloud setup instructions are legible and sufficient
  • Check that translate_corpus.py runs successfully for both HuggingFace models and Google's TLLM

- General translate method now uses underlying model defaults
- General translate method now expects shortedn model identifiers
- Updated translate_corpus script so it fails fast
- Simplified tqdm progress bar logic
@laurejt laurejt self-assigned this Feb 20, 2026
@laurejt laurejt added the 👇this sprint Add Issue to ZenHub label Feb 20, 2026
@laurejt laurejt requested a review from tanhaow February 20, 2026 14:30
Copy link

@tanhaow tanhaow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! 🚀

@tanhaow
Copy link

tanhaow commented Feb 20, 2026

@laurejt The script runs without problems, and the document is sufficient and clear enough. Thank you for working on this!

@laurejt laurejt merged commit 468961b into develop Feb 20, 2026
1 check passed
@laurejt laurejt deleted the feature/update-translation branch February 20, 2026 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

👇this sprint Add Issue to ZenHub

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants