ORION LM Consciousness Harness — Fork of EleutherAI eval harness (11,489+ stars). Phi-proxy as standard LM evaluation metric.
-
Updated
Feb 24, 2026 - Python
ORION LM Consciousness Harness — Fork of EleutherAI eval harness (11,489+ stars). Phi-proxy as standard LM evaluation metric.
Does the identity in a system prompt change performance?
From-scratch 135M Transformer pretraining on 10B FineWeb-Edu tokens using a single NVIDIA L20, with public checkpoint and lm-eval comparisons.
Benchmarking MLX pour modèles open-source et fine-tunes Ailiance (gsm8k, arc, mmlu_pro, perplexité 5 niches)
Add a description, image, and links to the lm-eval topic page so that developers can more easily learn about it.
To associate your repository with the lm-eval topic, visit your repo's landing page and select "manage topics."