WiMarka

WiMarka is a comprehensive Python library and CLI tool designed for evaluating machine translations with advanced syntactic and semantic analysis, providing detailed interpretability for Philippine Languages.

📋 Table of Contents

Overview
Features
Supported Languages
Prerequisites
Installation
Usage
- Python Library
- Command-Line Interface (CLI)
Documentation
Project Structure
How It Works
Example
Development
Contributing
License
Authors

🔍 Overview

WiMarka addresses the critical need for accurate machine translation evaluation in Philippine languages. It goes beyond simple metrics by providing:

Error Detection: Identifies specific translation errors between source and target texts
Multi-dimensional Scoring: Evaluates translations across fluency, adequacy, and overall quality
Explainability: Generates human-readable explanations for detected errors
Correction Suggestions: Provides corrected translation alternatives
Philippine Language Focus: Specialized support for Cebuano (CEB), Ilocano (ILO), and Tagalog (TGT)

✨ Features

🔍 Error Detection: Advanced algorithms to identify translation inconsistencies and errors
📊 Multi-dimensional Scoring:
- Fluency Score: Measures how natural the translation reads
- Adequacy Score: Evaluates semantic completeness and accuracy
- Overall Quality Score: Comprehensive translation quality assessment
💡 Explainable Results: Detailed explanations for each detected error
🔧 Correction Suggestions: AI-powered suggestions for improving translations
🖥️ Dual Interface: Both Python library and CLI for flexible integration
🌏 Philippine Language Support: Specialized models for CEB, ILO, and TGT
📝 Batch Processing: Evaluate multiple sentence pairs efficiently

🌐 Supported Languages

Code	Language	Role
`EN`	English	Source/Target
`CEB`	Cebuano	Target
`ILO`	Ilocano	Target
`TGT`	Tagalog	Target

📦 Prerequisites

Before installing WiMarka, ensure you have:

Python >= 3.12
Microsoft Visual Studio with CMake installed
- Download Visual Studio
- Required for building native dependencies (llama-cpp-python)

🚀 Installation

Using pip (Recommended)

Install directly from GitHub:

pip install git+https://github.com/wimarka-uic/WiMarka.git

From Source

# Clone the repository
git clone https://github.com/wimarka-uic/WiMarka.git
cd WiMarka

# Install in development mode
pip install -e .

💻 Usage

Python Library

Use WiMarka programmatically in your Python projects:

from wimarka.main import wmk_eval

# Evaluate translations
wmk_eval(
    src_file_path='source_file.txt',  # Path to source text file
    src_lang='EN',                     # Source language code
    tgt_file_path='target_file.txt',  # Path to target translation file
    tgt_lang='CEB'                     # Target language code
)

Input File Format

Both source and target files should be plain text files with:

One sentence per line
UTF-8 encoding
Equal number of lines in both files

Example source_file.txt:

Good morning!
How are you today?

Example target_file.txt:

Maayong buntag!
Kumusta ka karon?

Command-Line Interface (CLI)

Evaluate translations directly from the terminal:

wimarka --src_file_path source_file.txt \
        --src_lang EN \
        --tgt_file_path target_file.txt \
        --tgt_lang CEB

CLI Options

Option	Description	Required
`--src_file_path`	Path to the source text file	Yes
`--src_lang`	Source language code (EN, CEB, ILO, TGT)	Yes
`--tgt_file_path`	Path to the target text file	Yes
`--tgt_lang`	Target language code (CEB, ILO, TGT)	Yes
`-h, --help`	Show help message	No

� Documentation

For comprehensive documentation, visit WiMarka Documentation on ReadtheDocs.

The documentation includes:

User Manual: Installation, usage guides, examples, and best practices
Technical Manual: Architecture, API reference, and development guides

Quick Links

�📁 Project Structure

WiMarka/
├── wimarka/                    # Main package directory
│   ├── __init__.py            # Package initialization
│   ├── main.py                # Core evaluation logic
│   ├── cli.py                 # Command-line interface
│   ├── config.py              # Configuration settings
│   ├── tasks/                 # Task modules
│   │   ├── error_detection.py     # Error detection logic
│   │   ├── scoring.py             # Translation scoring
│   │   ├── explanation.py         # Error explanation generation
│   │   └── correction.py          # Correction suggestion generation
│   └── utils/                 # Utility modules
│       ├── helper.py              # Helper functions
│       ├── logger.py              # Logging utilities
│       ├── model.py               # Model loading and management
│       ├── cache.py               # Caching utilities
│       └── torch.py               # PyTorch utilities
├── test/                      # Test files and examples
│   ├── main.py                # Test script
│   ├── source_file.txt        # Sample source file
│   └── target_file.txt        # Sample target file
├── setup.py                   # Package installation configuration
├── requirements.txt           # Python dependencies
├── LICENSE                    # MIT License
└── README.md                  # This file

⚙️ How It Works

WiMarka follows a four-stage evaluation pipeline:

Error Detection 🔍
- Analyzes source and target sentences
- Identifies syntactic and semantic errors
- Categorizes error types
Scoring 📊
- Calculates fluency score (0-100)
- Calculates adequacy score (0-100)
- Computes overall quality score (0-100)
Explanation Generation 💡
- Provides human-readable explanations for each error
- Contextualizes issues in terms of linguistic quality
- Highlights specific problematic segments
Correction Suggestion 🔧
- Generates improved translation alternatives
- Addresses identified errors
- Maintains semantic integrity

📝 Example

Here's a complete example demonstrating WiMarka's capabilities:

Input Files:

en_source.txt:

Good morning!
How are you today?

ceb_translation.txt:

Magandang gabi!
Kamusta ka na ngayon?

Python Code:

from wimarka.main import wmk_eval

wmk_eval(
    src_file_path='en_source.txt',
    src_lang='EN',
    tgt_file_path='ceb_translation.txt',
    tgt_lang='CEB'
)

Sample Output:

Evaluating line 1/2
Detecting errors...
Scoring translation...
Generating explanation...
Correcting translation...

Evaluating line 2/2
Detecting errors...
Scoring translation...
Generating explanation...
Correcting translation...

=== Evaluation Results ===
----------------------------------------
Line 1:
  Source: Good morning!
  Target: Magandang gabi!
  Errors: [Semantic mismatch: "morning" vs "gabi" (evening)]
  Fluency Score: 95/100
  Adequacy Score: 40/100
  Overall Score: 67.5/100
  Explanation: The translation has incorrect time reference...
  Suggested Correction: Maayong buntag!
----------------------------------------

Evaluation completed.

🛠️ Development

Setting Up Development Environment

# Clone the repository
git clone https://github.com/wimarka-uic/WiMarka.git
cd WiMarka

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install in editable mode
pip install -e .

Running Tests

cd test
python main.py

🤝 Contributing

We welcome contributions from the community! To contribute:

Fork the repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Please ensure your code follows the project's coding standards and includes appropriate tests.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Copyright 2025 University of the Immaculate Conception - College of Computer Studies

👥 Authors

University of the Immaculate Conception - College of Computer Studies

📚 Citation

If you use WiMarka in your research, please cite:

@software{wimarka2025,
  title={WiMarka: A Reference-free Evaluation Metric for Machine 
Translation of Philippine Languages},
  author={University of the Immaculate Conception},
  year={2025},
  url={https://github.com/wimarka-uic/WiMarka}
}

🙏 Acknowledgments

University of the Immaculate Conception - College of Computer Studies
WiMarka Research Team:
- Al Gabriel Orig
- Charlese Te
- Shaira Montojo
- Adviser: Assoc. Prof. Kristine Mae Adlaon
Mindanao Natural Language Processing Research and Development Laboratory
Annotators and Evaluators via Lakra

📧 Contact & Support

For questions, issues, or suggestions:

Issues: GitHub Issues
Discussions: GitHub Discussions

Made with ❤️ for Philippine Languages

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WiMarka

📋 Table of Contents

🔍 Overview

✨ Features

🌐 Supported Languages

📦 Prerequisites

🚀 Installation

Using pip (Recommended)

From Source

💻 Usage

Python Library

Input File Format

Command-Line Interface (CLI)

CLI Options

� Documentation

Quick Links

�📁 Project Structure

⚙️ How It Works

📝 Example

🛠️ Development

Setting Up Development Environment

Running Tests

🤝 Contributing

📄 License

👥 Authors

📚 Citation

🙏 Acknowledgments

📧 Contact & Support

About

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
docs		docs
test		test
wimarka		wimarka
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

License

wimarka-uic/WiMarka

Folders and files

Latest commit

History

Repository files navigation

WiMarka

📋 Table of Contents

🔍 Overview

✨ Features

🌐 Supported Languages

📦 Prerequisites

🚀 Installation

Using pip (Recommended)

From Source

💻 Usage

Python Library

Input File Format

Command-Line Interface (CLI)

CLI Options

� Documentation

Quick Links

�📁 Project Structure

⚙️ How It Works

📝 Example

🛠️ Development

Setting Up Development Environment

Running Tests

🤝 Contributing

📄 License

👥 Authors

📚 Citation

🙏 Acknowledgments

📧 Contact & Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages