arc-agi-2

The static track of ARC Prize 2026 — what it is, how it's scored, and why 2026 matters.

Why this repo

A focused look at the ARC-AGI-2 track of ARC Prize 2026 — the static, input→output grid benchmark. Background and the wider story live in the arc-agi hub; this repo zooms in on the static track specifically.

Round 1 — public explainer. This first pass is a sourced explainer of the track. A later round will add the working material — an approach write-up and Kaggle notebooks for experimentation and competition submission — see the Roadmap.

Dating discipline. Competition details change. Claims below are dated and linked in Sources, re-verified 2026-06-20. Re-check arcprize.org and Kaggle before relying on any figure.

What ARC-AGI-2 is

ARC-AGI-2 is the classic ARC format, made harder. Each task gives you a few input→output grid pairs; you infer the single transformation rule they share and apply it to a held-out test input. The grids are simple and human-readable, but every task has its own rule and the evaluation tasks are novel — so memorising data doesn't help. (For why the benchmark is built this way — Chollet's skill-acquisition- efficiency framing — see the hub.)

What's specific to the -2 generation and the 2026 competition:

Two attempts per task. A task counts as solved if either of two predicted output grids is exactly correct.
85% target, within efficiency limits. The Grand Prize bar is 85% on the private evaluation set, achieved inside Kaggle's compute and time budget — not at any cost.
Redesigned to resist 2024-era recipes. ARC-AGI-2 was introduced in 2025 to push back on the test-time-training-heavy ensembles and large-model program search that took the previous benchmark to 55.5%, so a high score again has to reflect genuine generalisation.
2026 is the final year ARC-AGI-2 runs as an official Kaggle competition.

The rules that shape solutions

Sandboxed evaluation: no internet during Kaggle scoring — so no hosted-API models (GPT/Claude/etc.). Solutions run self-contained under compute/time limits. In practice this favours small, self-contained, adaptive systems.
Open-source for eligibility: prize-eligible code and methods must be open-sourced under a permissive licence (CC0 / MIT-0 in practice), attached to a Solution Writeup within seven days of the deadline.
Key dates (2026): opens Mar 25, submissions due Nov 2, winners announced Dec 4.

How solvers approach the static track

The full lineage — brute-force program synthesis, augmentation/ensembling, test-time training, LLM-guided search, and tiny recursive models — is documented in the hub's approaches tour. The short version: progress on the static track has come from adapting to the specific task (search for a fitting program, fine-tune on its examples, or recurse a small model), and the no-internet sandbox rules push eligible work toward compact, self-contained models. Tiny recursive models (TRM/HRM) are a particularly good fit for these constraints — explored in recursive-reasoning-models.

Roadmap

The approach I'm exploring is an ensemble with two branches, both adapted to the ARC-AGI-2 dataset:

(a) an LLM branch — a Qwen model with test-time training (TTT); and
(b) a TRM branch — a tiny recursive model (see recursive-reasoning-models).

Both fit the no-internet sandbox (small, self-contained), and the two branches are meant to cover each other's failure modes. This is a direction under active exploration, not a finished result.

Planned working material for the next round:

arc-agi-2/
├── docs/approach.md   # the two-branch ensemble — design + rationale
├── notebooks/         # Kaggle notebooks: experimenting with approaches,
│                      #   plus competition submission notebooks
├── experiments/       # supporting experiment code
└── data/              # (gitignored) public tasks fetched, never committed

Competition-rules note for that work: the public ARC-AGI tasks (fchollet/ARC-AGI) are openly licensed — analyse freely, but fetch, don't commit them; never commit private/competition eval data. data/ stays gitignored.

Siblings

🧭 arc-agi — the hub (start here)
🕹️ arc-agi-3 — the interactive / agentic track
🔁 recursive-reasoning-models — a sandbox-friendly approach

Sources

Re-verified 2026-06-20. Re-check before reuse.

ARC Prize 2026 — ARC-AGI-2 track — https://arcprize.org/competitions/2026/arc-agi-2
ARC Prize 2026 — competition overview — https://arcprize.org/competitions/2026
ARC-AGI-2 on Kaggle — https://www.kaggle.com/competitions/arc-prize-2026-arc-agi-2
F. Chollet, On the Measure of Intelligence (2019) — https://arxiv.org/abs/1911.01547
Public ARC-AGI tasks — https://github.com/fchollet/ARC-AGI

🌐 arodmor.me · 💻 github.com/arodmor · ✉️ antonio.rodriguez.moral@pm.me

Part of a series: AI/ML Lab · voice-ai-landscape · arc-agi · recursive-reasoning-models

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

arc-agi-2

Why this repo

What ARC-AGI-2 is

The rules that shape solutions

How solvers approach the static track

Roadmap

Siblings

Sources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

arc-agi-2

Why this repo

What ARC-AGI-2 is

The rules that shape solutions

How solvers approach the static track

Roadmap

Siblings

Sources

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages