evaluation pipeline implemented by FacuSentena · Pull Request #33 · thisisqubika/data-migration-accelerator

FacuSentena · 2026-01-02T15:56:29Z

PR: SQL Translation Evaluation Pipeline

Implemented an automated framework to benchmark Snowflake → Databricks SQL translation models using MLflow and LLM-as-a-judge.

Strict Scoring: Deduction-based scoring for Compliance (functional correctness) and Best Practices (optimization/docs).
A/B Comparison: Uses Nested MLflow Runs to enable side-by-side grouped bar charts for model comparison.
Diagnostics: Exports top_issues_summary.txt and issues_table.json to pinpoint model weaknesses.
Interfaces:
- CLI: run_local_benchmark.py for batch runs.
- Notebook: benchmark_interactive.ipynb for visual research.

Facundo Sentena added 2 commits January 2, 2026 12:52

evaluation pipeline implemented

e154afd

update poetry.lock and remove notebook-requirements.txt

bd5cf15

FacuSentena merged commit ec9c580 into main Jan 5, 2026
1 check passed