krippendorff-alpha

Here are 2 public repositories matching this topic...

lizhiyao / oh-my-knowledge

Evaluation framework for LLM knowledge inputs — prompts, RAG corpora, skills, agent workflows. Fix the model, vary the artifact. Built-in statistical rigor: bootstrap CI, Krippendorff α, length-debias, saturation curves.

benchmark ai evaluation-framework claude knowledge-engineering skill-evaluation llm prompt-engineering prompt-testing llm-evaluation rag-evaluation llm-judge claude-code agent-evaluation bootstrap-ci krippendorff-alpha evaluation-as-code multi-judge-ensemble

Updated May 30, 2026
TypeScript

WatchTree-19 / llm-judge-calibration

Star

Measure how much your LLM judges actually agree. Inter-judge agreement metrics for LLM-as-a-judge evaluations.

python evaluation calibration eval agreement cohens-kappa inter-rater-agreement llm llm-as-judge krippendorff-alpha

Updated May 14, 2026
Python

Improve this page

Add a description, image, and links to the krippendorff-alpha topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the krippendorff-alpha topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly