AI-powered YAML locale file translator for Rails and Ruby projects
BetterTranslate automatically translates your YAML locale files using cutting-edge AI providers (ChatGPT, Google Gemini, and Anthropic Claude). It's designed for Rails applications but works with any Ruby project that uses YAML-based internationalization.
🎯 Why BetterTranslate?
- ✅ Production-Ready: Tested with real APIs via VCR cassettes (18 cassettes, 260KB)
- ✅ Interactive Demo: Try it in 2 minutes with
ruby spec/dummy/demo_translation.rb - ✅ Variable Preservation:
%{name}placeholders maintained in translations - ✅ Nested YAML Support: Complex structures preserved perfectly
- ✅ Multiple Providers: Choose ChatGPT, Gemini, or Claude
| Provider | Model | Speed | Quality | Cost |
|---|---|---|---|---|
| ChatGPT | GPT-5-nano | ⚡⚡⚡ Fast | ⭐⭐⭐⭐⭐ Excellent | 💰💰 Medium |
| Gemini | gemini-2.0-flash-exp | ⚡⚡⚡⚡ Very Fast | ⭐⭐⭐⭐ Very Good | 💰 Low |
| Claude | Claude 3.5 | ⚡⚡ Medium | ⭐⭐⭐⭐⭐ Excellent | 💰💰💰 High |
- 🤖 Multiple AI Providers: Support for ChatGPT (GPT-5-nano), Google Gemini (gemini-2.0-flash-exp), and Anthropic Claude
- ⚡ Intelligent Caching: LRU cache with optional TTL reduces API costs and speeds up repeated translations
- 🔄 Translation Modes: Choose between override (replace entire files) or incremental (merge with existing translations)
- 🎯 Smart Strategies: Automatic selection between deep translation (< 50 strings) and batch translation (≥ 50 strings)
- 🚫 Flexible Exclusions: Global exclusions for all languages + language-specific exclusions for fine-grained control
- 🎨 Translation Context: Provide domain-specific context for medical, legal, financial, or technical terminology
- 📊 Similarity Analysis: Built-in Levenshtein distance analyzer to identify similar translations
- 🔍 Orphan Key Analyzer: Find unused translation keys in your codebase with comprehensive reports (text, JSON, CSV)
- 📝 Automatic File Creation: Input files are automatically created if they don't exist
- 🔧 Initializer Priority: Rake task now checks for initializer configuration before YAML config
- 🐛 Fixed Loop Issues: Removed problematic
after_initializehook that caused deadlocks - 🔄 Ruby 3.4.0 Support: Added explicit CSV dependency for compatibility
- 🎛️ Provider-Specific Options: Fine-tune AI behavior with
model,temperature, andmax_tokens - 💾 Automatic Backups: Configurable backup rotation before overwriting files (
.bak,.bak.1,.bak.2) - 📦 JSON Support: Full support for JSON locale files (React, Vue, modern JS frameworks)
- ⚡ Parallel Translation: Translate multiple languages concurrently with thread-based execution
- 📁 Multiple Files: Translate multiple files with arrays or glob patterns (
**/*.en.yml)
- 🧪 Comprehensive Testing: Unit tests + integration tests with VCR cassettes (18 cassettes, 260KB)
- 🎬 Rails Dummy App: Interactive demo with real translations (
ruby spec/dummy/demo_translation.rb) - 🔒 VCR Integration: Record real API responses, test without API keys, CI/CD friendly
- 🛡️ Type-Safe Configuration: Comprehensive validation with detailed error messages
- 📚 YARD Documentation: Complete API documentation with examples
- 🔁 Retry Logic: Exponential backoff for failed API calls (3 attempts, configurable)
- 🚦 Rate Limiting: Thread-safe rate limiter prevents API overload
Clone the repo and run the demo to see BetterTranslate in action:
git clone https://github.com/alessiobussolari/better_translate.git
cd better_translate
bundle install
# Set your OpenAI API key
export OPENAI_API_KEY=your_key_here
# Run the demo!
ruby spec/dummy/demo_translation.rbWhat happens:
- ✅ Reads
en.ymlwith 16 translation keys - ✅ Translates to Italian and French using ChatGPT
- ✅ Generates
it.ymlandfr.ymlfiles - ✅ Shows progress, results, and sample translations
- ✅ Takes ~2 minutes (real API calls)
Sample Output:
# en.yml (input)
en:
hello: "Hello"
users:
greeting: "Hello %{name}"
# it.yml (generated) ✅
it:
hello: "Ciao"
users:
greeting: "Ciao %{name}" # Variable preserved!
# fr.yml (generated) ✅
fr:
hello: "Bonjour"
users:
greeting: "Bonjour %{name}" # Variable preserved!See spec/dummy/USAGE_GUIDE.md for more examples.
# config/initializers/better_translate.rb
BetterTranslate.configure do |config|
config.provider = :chatgpt
config.openai_key = ENV["OPENAI_API_KEY"]
# IMPORTANT: Set these manually to match your Rails I18n configuration
# (I18n.default_locale and I18n.available_locales are not yet available)
config.source_language = "en" # Should match config.i18n.default_locale
config.target_languages = [
{ short_name: "it", name: "Italian" },
{ short_name: "fr", name: "French" },
{ short_name: "es", name: "Spanish" }
]
config.input_file = "config/locales/en.yml"
config.output_folder = "config/locales"
# Optional: Provide context for better translations
config.translation_context = "E-commerce application with product catalog"
end
# Translate all files
BetterTranslate.translate_allAdd this line to your application's Gemfile:
gem "better_translate"And then execute:
bundle installOr install it yourself as:
gem install better_translateFor Rails applications, generate the initializer:
rails generate better_translate:installThis creates config/initializers/better_translate.rb with example configuration for all supported providers.
Important Notes (v1.1.1+):
- The initializer now uses manual language configuration instead of
I18n.default_locale - You must set
source_languageandtarget_languagesto match yourconfig/application.rbI18n settings - This prevents loop/deadlock issues when running rake tasks
- Input files are automatically created if they don't exist
BetterTranslate.configure do |config|
config.provider = :chatgpt
config.openai_key = ENV["OPENAI_API_KEY"]
# Optional: customize model settings (defaults shown)
config.request_timeout = 30 # seconds
config.max_retries = 3
config.retry_delay = 2.0 # seconds
# 🆕 v1.1.0: Provider-specific options
config.model = "gpt-5-nano" # Specify model (optional)
config.temperature = 0.3 # Creativity (0.0-2.0, default: 0.3)
config.max_tokens = 2000 # Response length limit
endGet your API key from OpenAI Platform.
BetterTranslate.configure do |config|
config.provider = :gemini
config.google_gemini_key = ENV["GOOGLE_GEMINI_API_KEY"]
# Same optional settings as ChatGPT
config.request_timeout = 30
config.max_retries = 3
endGet your API key from Google AI Studio.
BetterTranslate.configure do |config|
config.provider = :anthropic
config.anthropic_key = ENV["ANTHROPIC_API_KEY"]
# Same optional settings
config.request_timeout = 30
config.max_retries = 3
endGet your API key from Anthropic Console.
Protect your translation files with automatic backup creation:
config.create_backup = true # Enable backups (default: true)
config.max_backups = 5 # Keep up to 5 backup versionsBackup files are created with rotation:
- First backup:
it.yml.bak - Second backup:
it.yml.bak.1 - Third backup:
it.yml.bak.2 - Older backups are automatically deleted
Translate JSON locale files for modern JavaScript frameworks:
# Automatically detects JSON format from file extension
config.input_file = "config/locales/en.json"
config.output_folder = "config/locales"
# All features work with JSON: backups, incremental mode, exclusions, etc.Example JSON file:
{
"en": {
"common": {
"greeting": "Hello %{name}"
}
}
}Translate multiple languages concurrently for faster processing:
config.target_languages = [
{ short_name: "it", name: "Italian" },
{ short_name: "fr", name: "French" },
{ short_name: "es", name: "Spanish" },
{ short_name: "de", name: "German" }
]
config.max_concurrent_requests = 4 # Translate 4 languages at oncePerformance improvement: With 4 languages and max_concurrent_requests = 4, translation time is reduced by ~75% compared to sequential processing.
Translate multiple files in a single run:
# Array of specific files
config.input_files = [
"config/locales/common.en.yml",
"config/locales/errors.en.yml",
"config/locales/admin.en.yml"
]
# Or use glob patterns (recommended)
config.input_files = "config/locales/**/*.en.yml"
# Or combine both approaches
config.input_files = [
"config/locales/**/*.en.yml",
"app/javascript/translations/*.en.json"
]Output files preserve the original structure:
common.en.yml→common.it.ymlerrors.en.yml→errors.it.ymladmin/settings.en.yml→admin/settings.it.yml
config.source_language = "en" # ISO 639-1 code (2 letters)
config.target_languages = [
{ short_name: "it", name: "Italian" },
{ short_name: "fr", name: "French" },
{ short_name: "de", name: "German" },
{ short_name: "es", name: "Spanish" },
{ short_name: "pt", name: "Portuguese" },
{ short_name: "ja", name: "Japanese" },
{ short_name: "zh", name: "Chinese" }
]config.input_file = "config/locales/en.yml" # Source file
config.output_folder = "config/locales" # Output directoryNote (v1.1.1+): If the input file doesn't exist, it will be automatically created with a minimal valid structure (e.g., { "en": {} }).
Replaces the entire target file with fresh translations:
config.translation_mode = :override # defaultUse when: Starting fresh or regenerating all translations.
Merges with existing translations, only translating missing keys:
config.translation_mode = :incrementalUse when: Preserving manual corrections or adding new keys to existing translations.
The LRU (Least Recently Used) cache stores translations to reduce API costs:
config.cache_enabled = true # default: true
config.cache_size = 1000 # default: 1000 items
config.cache_ttl = 3600 # optional: 1 hour in seconds (nil = no expiration)Cache key format: "#{text}:#{target_lang_code}"
Benefits:
- Reduces API costs for repeated translations
- Speeds up re-runs during development
- Thread-safe with Mutex protection
Prevent API overload with built-in rate limiting:
config.max_concurrent_requests = 3 # default: 3The rate limiter enforces a 0.5-second delay between requests by default. This is handled automatically by the BaseHttpProvider.
Keys excluded from translation in all target languages (useful for brand names, product codes, etc.):
config.global_exclusions = [
"app.name", # "MyApp" should never be translated
"app.company", # "ACME Inc." stays the same
"product.sku" # "SKU-12345" is language-agnostic
]Keys excluded only for specific languages (useful for manually translated legal text, locale-specific content, etc.):
config.exclusions_per_language = {
"it" => ["legal.terms", "legal.privacy"], # Italian legal text manually reviewed
"de" => ["legal.terms", "legal.privacy"], # German legal text manually reviewed
"fr" => ["marketing.slogan"] # French slogan crafted by marketing team
}Example:
legal.termsis translated for Spanish, Portuguese, etc.- But excluded for Italian and German (already manually translated)
Provide domain-specific context to improve translation accuracy:
config.translation_context = "Medical terminology for healthcare applications"This context is included in the AI system prompt, helping with specialized terminology in fields like:
- 🏥 Medical/Healthcare: "patient", "diagnosis", "treatment"
- ⚖️ Legal: "plaintiff", "defendant", "liability"
- 💰 Financial: "dividend", "amortization", "escrow"
- 🛒 E-commerce: "checkout", "cart", "inventory"
- 🔧 Technical: "API", "endpoint", "authentication"
BetterTranslate automatically selects the optimal strategy based on content size:
- Translates each string individually
- Detailed progress tracking
- Best for small to medium files
- Processes in batches of 10 strings
- Faster for large files
- Reduced API overhead
You don't need to configure this - it's automatic! 🎯
BetterTranslate provides three Rails generators:
Generate the initializer with example configuration:
rails generate better_translate:installCreates: config/initializers/better_translate.rb
Run the translation process:
rails generate better_translate:translateThis triggers the translation based on your configuration and displays progress messages.
Note (v1.1.1+): The generator now prioritizes configuration from config/initializers/better_translate.rb over YAML config files. If no configuration is found, it provides helpful error messages suggesting both configuration methods.
Analyze translation similarities using Levenshtein distance:
rails generate better_translate:analyzeOutput:
- Console summary with similar translation pairs
- Detailed JSON report:
tmp/translation_similarity_report.json - Human-readable summary:
tmp/translation_similarity_summary.txt
Use cases:
- Identify potential translation inconsistencies
- Find duplicate or near-duplicate translations
- Quality assurance for translation output
texts = ["Hello", "Goodbye", "Thank you"]
target_langs = [
{ short_name: "it", name: "Italian" },
{ short_name: "fr", name: "French" }
]
results = BetterTranslate::TranslationHelper.translate_texts_to_languages(texts, target_langs)
# Results structure:
# {
# "it" => ["Ciao", "Arrivederci", "Grazie"],
# "fr" => ["Bonjour", "Au revoir", "Merci"]
# }text = "Welcome to our application"
target_langs = [
{ short_name: "it", name: "Italian" },
{ short_name: "es", name: "Spanish" }
]
results = BetterTranslate::TranslationHelper.translate_text_to_languages(text, target_langs)
# Results:
# {
# "it" => "Benvenuto nella nostra applicazione",
# "es" => "Bienvenido a nuestra aplicación"
# }# Separate configuration for different domains
medical_config = BetterTranslate::Configuration.new
medical_config.provider = :chatgpt
medical_config.openai_key = ENV["OPENAI_API_KEY"]
medical_config.translation_context = "Medical terminology for patient records"
medical_config.validate!
# Use the custom config...Test your configuration without writing files:
config.dry_run = trueThis validates everything and simulates the translation process without creating output files.
Enable detailed logging for debugging:
config.verbose = trueThe Orphan Key Analyzer helps you find unused translation keys in your codebase. It scans your YAML locale files and compares them against your actual code usage, generating comprehensive reports.
Find orphan keys from the command line:
# Basic text report (default)
better_translate analyze \
--source config/locales/en.yml \
--scan-path app/
# JSON format (great for CI/CD)
better_translate analyze \
--source config/locales/en.yml \
--scan-path app/ \
--format json
# CSV format (easy to share with team)
better_translate analyze \
--source config/locales/en.yml \
--scan-path app/ \
--format csv
# Save to file
better_translate analyze \
--source config/locales/en.yml \
--scan-path app/ \
--output orphan_report.txtText format:
============================================================
Orphan Keys Analysis Report
============================================================
Statistics:
Total keys: 50
Used keys: 45
Orphan keys: 5
Usage: 90.0%
Orphan Keys (5):
------------------------------------------------------------
Key: users.old_message
Value: This feature was removed
Key: products.deprecated_label
Value: Old Label
...
============================================================
JSON format:
{
"orphans": ["users.old_message", "products.deprecated_label"],
"orphan_details": {
"users.old_message": "This feature was removed",
"products.deprecated_label": "Old Label"
},
"orphan_count": 5,
"total_keys": 50,
"used_keys": 45,
"usage_percentage": 90.0
}Use the analyzer in your Ruby code:
# Scan YAML file
key_scanner = BetterTranslate::Analyzer::KeyScanner.new("config/locales/en.yml")
all_keys = key_scanner.scan # Returns Hash of all keys
# Scan code for used keys
code_scanner = BetterTranslate::Analyzer::CodeScanner.new("app/")
used_keys = code_scanner.scan # Returns Set of used keys
# Detect orphans
detector = BetterTranslate::Analyzer::OrphanDetector.new(all_keys, used_keys)
orphans = detector.detect
# Get statistics
puts "Orphan count: #{detector.orphan_count}"
puts "Usage: #{detector.usage_percentage}%"
# Generate report
reporter = BetterTranslate::Analyzer::Reporter.new(
orphans: orphans,
orphan_details: detector.orphan_details,
total_keys: all_keys.size,
used_keys: used_keys.size,
usage_percentage: detector.usage_percentage,
format: :text
)
puts reporter.generate
reporter.save_to_file("orphan_report.txt")The analyzer recognizes these i18n patterns:
t('key')- Rails short formt("key")- Rails short form with double quotesI18n.t(:key)- Symbol syntaxI18n.t('key')- String syntaxI18n.translate('key')- Full method name<%= t('key') %>- ERB templatesI18n.t('key', param: value)- With parameters
Nested keys:
en:
users:
profile:
title: "Profile" # Detected as: users.profile.titleUse cases:
- Clean up unused translations before deployment
- Identify dead code after refactoring
- Reduce locale file size
- Improve translation maintenance
- Generate reports for translation teams
BetterTranslate includes comprehensive testing infrastructure with unit tests, integration tests, and a Rails dummy app for realistic testing.
spec/
├── better_translate/ # Unit tests (fast, no API calls)
│ ├── cache_spec.rb
│ ├── configuration_spec.rb
│ ├── providers/
│ │ ├── chatgpt_provider_spec.rb
│ │ └── gemini_provider_spec.rb
│ └── ...
│
├── integration/ # Integration tests (real API via VCR)
│ ├── chatgpt_integration_spec.rb
│ ├── gemini_integration_spec.rb
│ ├── rails_dummy_app_spec.rb
│ └── README.md
│
├── dummy/ # Rails dummy app for testing
│ ├── config/
│ │ └── locales/
│ │ ├── en.yml # Source file
│ │ ├── it.yml # Generated translations
│ │ └── fr.yml
│ ├── demo_translation.rb # Interactive demo script
│ └── USAGE_GUIDE.md
│
└── vcr_cassettes/ # Recorded API responses (18 cassettes, 260KB)
├── chatgpt/ (7)
├── gemini/ (7)
└── rails/ (4)
# Run all tests (unit + integration)
bundle exec rake spec
# or
bundle exec rspec
# Run only unit tests (fast, no API calls)
bundle exec rspec spec/better_translate/
# Run only integration tests (uses VCR cassettes)
bundle exec rspec spec/integration/
# Run specific test file
bundle exec rspec spec/better_translate/configuration_spec.rb
# Run tests with coverage
bundle exec rspec --format documentationBetterTranslate uses VCR (Video Cassette Recorder) to record real API interactions for integration tests. This allows:
✅ Realistic testing with actual provider responses ✅ No API keys needed after initial recording ✅ Fast test execution (no real API calls) ✅ CI/CD friendly (cassettes committed to repo) ✅ API keys anonymized (safe to commit)
# Copy environment template
cp .env.example .env
# Edit .env and add your API keys
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...
ANTHROPIC_API_KEY=sk-ant-...# Delete and re-record all cassettes
rm -rf spec/vcr_cassettes/
bundle exec rspec spec/integration/
# Re-record specific provider
rm -rf spec/vcr_cassettes/chatgpt/
bundle exec rspec spec/integration/chatgpt_integration_spec.rbNote: The .env file is gitignored. API keys in cassettes are automatically replaced with <OPENAI_API_KEY>, <GEMINI_API_KEY>, etc.
Test BetterTranslate with a realistic Rails app:
# Run interactive demo
ruby spec/dummy/demo_translation.rbOutput:
🚀 Starting translation...
[BetterTranslate] Italian | hello | 6.3%
[BetterTranslate] Italian | world | 12.5%
...
✅ Success: 2 language(s)
✓ it.yml generated (519 bytes)
✓ fr.yml generated (511 bytes)
Generated files:
spec/dummy/config/locales/it.yml- Italian translationspec/dummy/config/locales/fr.yml- French translation
See spec/dummy/USAGE_GUIDE.md for more examples.
# Run RuboCop linter
bundle exec rubocop
# Auto-fix violations
bundle exec rubocop -a
# Run both tests and linter
bundle exec rake# Generate YARD documentation
bundle exec yard doc
# Start documentation server (http://localhost:8808)
bundle exec yard server
# Check documentation coverage
bundle exec yard stats# Load the gem in an interactive console
bin/console# Check for security vulnerabilities
bundle exec bundler-audit check --updateAll providers inherit from BaseHttpProvider:
BaseHttpProvider (abstract)
├── ChatGPTProvider
├── GeminiProvider
└── AnthropicProvider
BaseHttpProvider responsibilities:
- HTTP communication via Faraday
- Retry logic with exponential backoff
- Rate limiting
- Timeout handling
- Error wrapping
- Configuration: Type-safe config with validation
- Cache: LRU cache with optional TTL
- RateLimiter: Thread-safe request throttling
- Validator: Input validation (language codes, text, paths, keys)
- HashFlattener: Converts nested YAML ↔ flat structure
All errors inherit from BetterTranslate::Error:
BetterTranslate::Error
├── ConfigurationError
├── ValidationError
├── TranslationError
├── ProviderError
├── ApiError
├── RateLimitError
├── FileError
├── YamlError
└── ProviderNotFoundError
- USAGE_GUIDE.md - Complete guide to dummy app and demos
- VCR Testing Guide - How to test with VCR cassettes
- CLAUDE.md - Developer guide for AI assistants (Claude Code)
- YARD Docs - Complete API documentation
better_translate/
├── README.md # This file (main documentation)
├── CLAUDE.md # Development guide (commands, architecture)
├── spec/
│ ├── dummy/
│ │ ├── USAGE_GUIDE.md # 📖 Interactive demo guide
│ │ └── demo_translation.rb # 🚀 Runnable demo script
│ └── integration/
│ └── README.md # 🧪 VCR testing guide
└── docs/
└── implementation/ # Design docs
Bug reports and pull requests are welcome on GitHub at https://github.com/alessiobussolari/better_translate.
- TDD (Test-Driven Development): Always write tests before implementing features
- YARD Documentation: Document all public methods with
@param,@return,@raise, and@example - RuboCop Compliance: Ensure code passes
bundle exec rubocopbefore committing - Frozen String Literals: Include
# frozen_string_literal: trueat the top of all files - HTTP Client: Use Faraday for all HTTP requests (never Net::HTTP or HTTParty)
- VCR Cassettes: Record integration tests with real API responses for CI/CD
# 1. Clone and setup
git clone https://github.com/alessiobussolari/better_translate.git
cd better_translate
bundle install
# 2. Create a feature branch
git checkout -b my-feature
# 3. Write tests first (TDD)
# Edit spec/better_translate/my_feature_spec.rb
# 4. Implement the feature
# Edit lib/better_translate/my_feature.rb
# 5. Ensure tests pass and code is clean
bundle exec rspec
bundle exec rubocop
# 6. Commit and push
git add .
git commit -m "Add my feature"
git push origin my-feature
# 7. Create a Pull RequestReleases are automated via GitHub Actions:
# 1. Update version
vim lib/better_translate/version.rb # VERSION = "1.0.1"
# 2. Update CHANGELOG
vim CHANGELOG.md
# 3. Commit and tag
git add -A
git commit -m "chore: Release v1.0.1"
git tag v1.0.1
git push origin main
git push origin v1.0.1
# 4. GitHub Actions automatically:
# ✅ Runs tests
# ✅ Builds gem
# ✅ Publishes to RubyGems.org
# ✅ Creates GitHub ReleaseSetup: See .github/RUBYGEMS_SETUP.md for configuring RubyGems trusted publishing (no API keys needed!).
The gem is available as open source under the terms of the MIT License.
Everyone interacting in the BetterTranslate project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.
Made with ❤️ by Alessio Bussolari