The static track of ARC Prize 2026 — what it is, how it's scored, and why 2026 matters.
A focused look at the ARC-AGI-2 track of ARC Prize 2026 — the static,
input→output grid benchmark. Background and the wider story live in the
arc-agi hub; this repo zooms in on the
static track specifically.
Round 1 — public explainer. This first pass is a sourced explainer of the track. A later round will add the working material — an approach write-up and Kaggle notebooks for experimentation and competition submission — see the Roadmap.
Dating discipline. Competition details change. Claims below are dated and linked in Sources, re-verified 2026-06-20. Re-check
arcprize.organd Kaggle before relying on any figure.
ARC-AGI-2 is the classic ARC format, made harder. Each task gives you a few input→output grid pairs; you infer the single transformation rule they share and apply it to a held-out test input. The grids are simple and human-readable, but every task has its own rule and the evaluation tasks are novel — so memorising data doesn't help. (For why the benchmark is built this way — Chollet's skill-acquisition- efficiency framing — see the hub.)
What's specific to the -2 generation and the 2026 competition:
- Two attempts per task. A task counts as solved if either of two predicted output grids is exactly correct.
- 85% target, within efficiency limits. The Grand Prize bar is 85% on the private evaluation set, achieved inside Kaggle's compute and time budget — not at any cost.
- Redesigned to resist 2024-era recipes. ARC-AGI-2 was introduced in 2025 to push back on the test-time-training-heavy ensembles and large-model program search that took the previous benchmark to 55.5%, so a high score again has to reflect genuine generalisation.
- 2026 is the final year ARC-AGI-2 runs as an official Kaggle competition.
- Sandboxed evaluation: no internet during Kaggle scoring — so no hosted-API models (GPT/Claude/etc.). Solutions run self-contained under compute/time limits. In practice this favours small, self-contained, adaptive systems.
- Open-source for eligibility: prize-eligible code and methods must be open-sourced under a permissive licence (CC0 / MIT-0 in practice), attached to a Solution Writeup within seven days of the deadline.
- Key dates (2026): opens Mar 25, submissions due Nov 2, winners announced Dec 4.
The full lineage — brute-force program synthesis, augmentation/ensembling,
test-time training, LLM-guided search, and tiny recursive models — is documented in
the hub's approaches tour.
The short version: progress on the static track has come from adapting to the
specific task (search for a fitting program, fine-tune on its examples, or recurse
a small model), and the no-internet sandbox rules push eligible work toward compact,
self-contained models. Tiny recursive models (TRM/HRM) are a particularly good fit
for these constraints — explored in
recursive-reasoning-models.
The approach I'm exploring is an ensemble with two branches, both adapted to the ARC-AGI-2 dataset:
- (a) an LLM branch — a Qwen model with test-time training (TTT); and
- (b) a TRM branch — a tiny recursive model (see
recursive-reasoning-models).
Both fit the no-internet sandbox (small, self-contained), and the two branches are meant to cover each other's failure modes. This is a direction under active exploration, not a finished result.
Planned working material for the next round:
arc-agi-2/
├── docs/approach.md # the two-branch ensemble — design + rationale
├── notebooks/ # Kaggle notebooks: experimenting with approaches,
│ # plus competition submission notebooks
├── experiments/ # supporting experiment code
└── data/ # (gitignored) public tasks fetched, never committed
Competition-rules note for that work: the public ARC-AGI tasks
(fchollet/ARC-AGI) are openly licensed — analyse freely, but fetch, don't
commit them; never commit private/competition eval data. data/ stays gitignored.
- 🧭 arc-agi — the hub (start here)
- 🕹️ arc-agi-3 — the interactive / agentic track
- 🔁 recursive-reasoning-models — a sandbox-friendly approach
Re-verified 2026-06-20. Re-check before reuse.
- ARC Prize 2026 — ARC-AGI-2 track — https://arcprize.org/competitions/2026/arc-agi-2
- ARC Prize 2026 — competition overview — https://arcprize.org/competitions/2026
- ARC-AGI-2 on Kaggle — https://www.kaggle.com/competitions/arc-prize-2026-arc-agi-2
- F. Chollet, On the Measure of Intelligence (2019) — https://arxiv.org/abs/1911.01547
- Public ARC-AGI tasks — https://github.com/fchollet/ARC-AGI
Prose and figures in this repo are © 2026 Antonio Rodriguez-Moral, licensed CC BY 4.0; code is MIT.
🌐 arodmor.me · 💻 github.com/arodmor · ✉️ antonio.rodriguez.moral@pm.me
Part of a series: AI/ML Lab · voice-ai-landscape · arc-agi · recursive-reasoning-models