Skip to content

bertelsmannstift/pragmata

Repository files navigation

pRAGmata: Evidence-grounded evaluation for RAG systems

License CI Status

pRAGmata (πράγματα, Ancient Greek: roughly “matters of fact”) is a Python framework for empirically evaluating retrieval-augmented generation (RAG) systems. It evaluates systems over annotated query–response examples with explicit, decomposed evaluation dimensions, rather than relying on opaque or heuristic scoring. The framework consists of three components:

  • An AI workflow for controlled synthetic query generation and variation
  • A web-based annotation interface for domain experts to label query–response pairs across retrieval, grounding, and generation aspects
  • A label-based evaluation pipeline with model-assisted inference using transformer cross-encoders for scalable scoring

These components can be used independently or combined into an evaluation workflow:

query generationannotationevaluation

Note

The project is under active construction and the public API may change before the first stable release.

About

Evidence-grounded evaluation for RAG systems

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Contributors