melody-ling-L

melody-ling-L melody-ling-L

Pinned Loading

eval-resume eval-resume Public

中文 LLM 简历改写诚实度 benchmark：20 脱敏简历 × 3 模型 × 4 维度 · promptfoo + LLM-as-judge · 含在线报告

HTML
judgebuddy judgebuddy Public

Single-file labeling tool for LLM-as-judge calibration. Three-pane comparison + multi-dim scoring. Zero deployment.

HTML
llm-long-context-eval-zh llm-long-context-eval-zh Public

中文长上下文 LLM 评测框架 · 量化验证 Lost in the Middle 现象 · DeepSeek/Kimi/Qwen-Long 对比

HTML
prompt-learning-journey prompt-learning-journey Public

Python