Clinician-built benchmark and live leaderboard for medical AI safety evaluation.
ai-safety evaluation-framework failure-analysis patient-safety clinical-nlp medical-ai language-model-evaluation healthcare-ai huggingface-spaces medical-llm llm-evaluation llm-safety ai-benchmark source-verification medical-ai-safety clinician-review medical-language-models failure-atlas turkish-medical-ai medfailbench
-
Updated
Jul 3, 2026 - Python