Pinned Loading
-
eval-resume
eval-resume Public中文 LLM 简历改写诚实度 benchmark:20 脱敏简历 × 3 模型 × 4 维度 · promptfoo + LLM-as-judge · 含在线报告
HTML
-
judgebuddy
judgebuddy PublicSingle-file labeling tool for LLM-as-judge calibration. Three-pane comparison + multi-dim scoring. Zero deployment.
HTML
-
llm-long-context-eval-zh
llm-long-context-eval-zh Public中文长上下文 LLM 评测框架 · 量化验证 Lost in the Middle 现象 · DeepSeek/Kimi/Qwen-Long 对比
HTML
-
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.