pRAGmata: Evidence-grounded evaluation for RAG systems

pRAGmata (πράγματα, Ancient Greek: roughly “matters of fact”) is a Python framework for empirically evaluating retrieval-augmented generation (RAG) systems. It evaluates systems over annotated query–response examples with explicit, decomposed evaluation dimensions, rather than relying on opaque or heuristic scoring. The framework consists of three components:

An AI workflow for controlled synthetic query generation and variation
A web-based annotation interface for domain experts to label query–response pairs across retrieval, grounding, and generation aspects
A label-based evaluation pipeline with model-assisted inference using transformer cross-encoders for scalable scoring

These components can be used independently or combined into an evaluation workflow:

query generation → annotation → evaluation

Note

The project is under active construction and the public API may change before the first stable release.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.github		.github
deploy/annotation		deploy/annotation
docs		docs
src/pragmata		src/pragmata
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pRAGmata: Evidence-grounded evaluation for RAG systems

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pRAGmata: Evidence-grounded evaluation for RAG systems

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages