WiMarka is a comprehensive Python library and CLI tool designed for evaluating machine translations with advanced syntactic and semantic analysis, providing detailed interpretability for Philippine Languages.
- Overview
- Features
- Supported Languages
- Prerequisites
- Installation
- Usage
- Documentation
- Project Structure
- How It Works
- Example
- Development
- Contributing
- License
- Authors
WiMarka addresses the critical need for accurate machine translation evaluation in Philippine languages. It goes beyond simple metrics by providing:
- Error Detection: Identifies specific translation errors between source and target texts
- Multi-dimensional Scoring: Evaluates translations across fluency, adequacy, and overall quality
- Explainability: Generates human-readable explanations for detected errors
- Correction Suggestions: Provides corrected translation alternatives
- Philippine Language Focus: Specialized support for Cebuano (CEB), Ilocano (ILO), and Tagalog (TGT)
- π Error Detection: Advanced algorithms to identify translation inconsistencies and errors
- π Multi-dimensional Scoring:
- Fluency Score: Measures how natural the translation reads
- Adequacy Score: Evaluates semantic completeness and accuracy
- Overall Quality Score: Comprehensive translation quality assessment
- π‘ Explainable Results: Detailed explanations for each detected error
- π§ Correction Suggestions: AI-powered suggestions for improving translations
- π₯οΈ Dual Interface: Both Python library and CLI for flexible integration
- π Philippine Language Support: Specialized models for CEB, ILO, and TGT
- π Batch Processing: Evaluate multiple sentence pairs efficiently
| Code | Language | Role |
|---|---|---|
EN |
English | Source/Target |
CEB |
Cebuano | Target |
ILO |
Ilocano | Target |
TGT |
Tagalog | Target |
Before installing WiMarka, ensure you have:
- Python >= 3.12
- Microsoft Visual Studio with CMake installed
- Download Visual Studio
- Required for building native dependencies (llama-cpp-python)
Install directly from GitHub:
pip install git+https://github.com/wimarka-uic/WiMarka.git# Clone the repository
git clone https://github.com/wimarka-uic/WiMarka.git
cd WiMarka
# Install in development mode
pip install -e .Use WiMarka programmatically in your Python projects:
from wimarka.main import wmk_eval
# Evaluate translations
wmk_eval(
src_file_path='source_file.txt', # Path to source text file
src_lang='EN', # Source language code
tgt_file_path='target_file.txt', # Path to target translation file
tgt_lang='CEB' # Target language code
)Both source and target files should be plain text files with:
- One sentence per line
- UTF-8 encoding
- Equal number of lines in both files
Example source_file.txt:
Good morning!
How are you today?
Example target_file.txt:
Maayong buntag!
Kumusta ka karon?
Evaluate translations directly from the terminal:
wimarka --src_file_path source_file.txt \
--src_lang EN \
--tgt_file_path target_file.txt \
--tgt_lang CEB| Option | Description | Required |
|---|---|---|
--src_file_path |
Path to the source text file | Yes |
--src_lang |
Source language code (EN, CEB, ILO, TGT) | Yes |
--tgt_file_path |
Path to the target text file | Yes |
--tgt_lang |
Target language code (CEB, ILO, TGT) | Yes |
-h, --help |
Show help message | No |
For comprehensive documentation, visit WiMarka Documentation on ReadtheDocs.
The documentation includes:
- User Manual: Installation, usage guides, examples, and best practices
- Technical Manual: Architecture, API reference, and development guides
- π Installation Guide
- π Quick Start
- π Python Library Usage
- π» CLI Usage
- π§ API Reference
- ποΈ Architecture
WiMarka/
βββ wimarka/ # Main package directory
β βββ __init__.py # Package initialization
β βββ main.py # Core evaluation logic
β βββ cli.py # Command-line interface
β βββ config.py # Configuration settings
β βββ tasks/ # Task modules
β β βββ error_detection.py # Error detection logic
β β βββ scoring.py # Translation scoring
β β βββ explanation.py # Error explanation generation
β β βββ correction.py # Correction suggestion generation
β βββ utils/ # Utility modules
β βββ helper.py # Helper functions
β βββ logger.py # Logging utilities
β βββ model.py # Model loading and management
β βββ cache.py # Caching utilities
β βββ torch.py # PyTorch utilities
βββ test/ # Test files and examples
β βββ main.py # Test script
β βββ source_file.txt # Sample source file
β βββ target_file.txt # Sample target file
βββ setup.py # Package installation configuration
βββ requirements.txt # Python dependencies
βββ LICENSE # MIT License
βββ README.md # This file
WiMarka follows a four-stage evaluation pipeline:
-
Error Detection π
- Analyzes source and target sentences
- Identifies syntactic and semantic errors
- Categorizes error types
-
Scoring π
- Calculates fluency score (0-100)
- Calculates adequacy score (0-100)
- Computes overall quality score (0-100)
-
Explanation Generation π‘
- Provides human-readable explanations for each error
- Contextualizes issues in terms of linguistic quality
- Highlights specific problematic segments
-
Correction Suggestion π§
- Generates improved translation alternatives
- Addresses identified errors
- Maintains semantic integrity
Here's a complete example demonstrating WiMarka's capabilities:
Input Files:
en_source.txt:
Good morning!
How are you today?
ceb_translation.txt:
Magandang gabi!
Kamusta ka na ngayon?
Python Code:
from wimarka.main import wmk_eval
wmk_eval(
src_file_path='en_source.txt',
src_lang='EN',
tgt_file_path='ceb_translation.txt',
tgt_lang='CEB'
)Sample Output:
Evaluating line 1/2
Detecting errors...
Scoring translation...
Generating explanation...
Correcting translation...
Evaluating line 2/2
Detecting errors...
Scoring translation...
Generating explanation...
Correcting translation...
=== Evaluation Results ===
----------------------------------------
Line 1:
Source: Good morning!
Target: Magandang gabi!
Errors: [Semantic mismatch: "morning" vs "gabi" (evening)]
Fluency Score: 95/100
Adequacy Score: 40/100
Overall Score: 67.5/100
Explanation: The translation has incorrect time reference...
Suggested Correction: Maayong buntag!
----------------------------------------
Evaluation completed.
# Clone the repository
git clone https://github.com/wimarka-uic/WiMarka.git
cd WiMarka
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Install in editable mode
pip install -e .cd test
python main.pyWe welcome contributions from the community! To contribute:
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Please ensure your code follows the project's coding standards and includes appropriate tests.
This project is licensed under the MIT License - see the LICENSE file for details.
Copyright 2025 University of the Immaculate Conception - College of Computer Studies
University of the Immaculate Conception - College of Computer Studies
If you use WiMarka in your research, please cite:
@software{wimarka2025,
title={WiMarka: A Reference-free Evaluation Metric for Machine
Translation of Philippine Languages},
author={University of the Immaculate Conception},
year={2025},
url={https://github.com/wimarka-uic/WiMarka}
}- University of the Immaculate Conception - College of Computer Studies
- WiMarka Research Team:
- Mindanao Natural Language Processing Research and Development Laboratory
- Annotators and Evaluators via Lakra
For questions, issues, or suggestions:
- Issues: GitHub Issues
- Discussions: GitHub Discussions