ai-evals

Star

Here are 5 public repositories matching this topic...

solana8800 / langeval

Sponsor

Star

Evaluation Infrastructure for AI Agents

ai-evaluation agent-evaluation ai-evals

Updated Feb 25, 2026
TypeScript

RafaelParonis / jailbench

Star

🔍 Benchmark jailbreak resilience in LLMs with JailBench for clear insights and improved model defenses against jailbreak attempts.

python flask analytics openai alignment model-evaluation ai-safety security-testing red-teaming model-robustness anthropic litellm content-safety llm-jailbreaks tool-calling llm-benchmark ai-evals textual-tui

Updated Mar 1, 2026
Python

vibheksoni / jailbench

Star

Benchmark LLM jailbreak resilience across providers with standardized tests, adversarial mode, rich analytics, and a clean Web UI.

Updated Aug 12, 2025
Python

MohsinCreed / LangfuseOllama

Star

Free, local Langfuse OSS setup with Ollama for LLM evaluation, scoring, and datasets.

docker open-source self-hosted free no-cost local-llm ollama langfuse llm-evaluation prompt-evaluation offline-ai llm-as-judge llm-observability ai-evals

Updated Feb 5, 2026
TypeScript

EvalLoop is a self-improving agent that iterates on its own outputs using evals + automatic policy patches. It runs a task, scores the result against a rubric, updates its rules, and re-runs until it hits a target score with a UI showing attempts, score trends, violations, and policy diffs.

typescript nextjs agents prisma self-improving agentic-ai ai-evals

Updated Feb 19, 2026
TypeScript

Improve this page

Add a description, image, and links to the ai-evals topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-evals topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-evals

Here are 5 public repositories matching this topic...

solana8800 / langeval

RafaelParonis / jailbench

vibheksoni / jailbench

MohsinCreed / LangfuseOllama

khushi491 / evalloop

Improve this page

Add this topic to your repo