lm-eval

Here are 4 public repositories matching this topic...

ORION LM Consciousness Harness — Fork of EleutherAI eval harness (11,489+ stars). Phi-proxy as standard LM evaluation metric.

evaluation orion phi consciousness lm-eval

Does the identity in a system prompt change performance?

python benchmarking benchmark ai ai-agents uv ai-agent lm-evaluation-harness lm-eval

From-scratch 135M Transformer pretraining on 10B FineWeb-Edu tokens using a single NVIDIA L20, with public checkpoint and lm-eval comparisons.

pytorch transformer reproducibility pretraining l20 llm-training lm-eval fineweb-edu

Benchmarking MLX pour modèles open-source et fine-tunes Ailiance (gsm8k, arc, mmlu_pro, perplexité 5 niches)

benchmark mlx apple-silicon eu-ai-act lm-eval ailiance lora-evaluation

Add a description, image, and links to the lm-eval topic page so that developers can more easily learn about it.

To associate your repository with the lm-eval topic, visit your repo's landing page and select "manage topics."