Skip to content

arodmor/arc-agi-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

arc-agi-2

The static track of ARC Prize 2026 — what it is, how it's scored, and why 2026 matters.

License: MIT Prose: CC BY 4.0 Hub: arc-agi Website Verified

Why this repo

A focused look at the ARC-AGI-2 track of ARC Prize 2026 — the static, input→output grid benchmark. Background and the wider story live in the arc-agi hub; this repo zooms in on the static track specifically.

Round 1 — public explainer. This first pass is a sourced explainer of the track. A later round will add the working material — an approach write-up and Kaggle notebooks for experimentation and competition submission — see the Roadmap.

Dating discipline. Competition details change. Claims below are dated and linked in Sources, re-verified 2026-06-20. Re-check arcprize.org and Kaggle before relying on any figure.

What ARC-AGI-2 is

ARC-AGI-2 is the classic ARC format, made harder. Each task gives you a few input→output grid pairs; you infer the single transformation rule they share and apply it to a held-out test input. The grids are simple and human-readable, but every task has its own rule and the evaluation tasks are novel — so memorising data doesn't help. (For why the benchmark is built this way — Chollet's skill-acquisition- efficiency framing — see the hub.)

What's specific to the -2 generation and the 2026 competition:

  • Two attempts per task. A task counts as solved if either of two predicted output grids is exactly correct.
  • 85% target, within efficiency limits. The Grand Prize bar is 85% on the private evaluation set, achieved inside Kaggle's compute and time budget — not at any cost.
  • Redesigned to resist 2024-era recipes. ARC-AGI-2 was introduced in 2025 to push back on the test-time-training-heavy ensembles and large-model program search that took the previous benchmark to 55.5%, so a high score again has to reflect genuine generalisation.
  • 2026 is the final year ARC-AGI-2 runs as an official Kaggle competition.

The rules that shape solutions

  • Sandboxed evaluation: no internet during Kaggle scoring — so no hosted-API models (GPT/Claude/etc.). Solutions run self-contained under compute/time limits. In practice this favours small, self-contained, adaptive systems.
  • Open-source for eligibility: prize-eligible code and methods must be open-sourced under a permissive licence (CC0 / MIT-0 in practice), attached to a Solution Writeup within seven days of the deadline.
  • Key dates (2026): opens Mar 25, submissions due Nov 2, winners announced Dec 4.

How solvers approach the static track

The full lineage — brute-force program synthesis, augmentation/ensembling, test-time training, LLM-guided search, and tiny recursive models — is documented in the hub's approaches tour. The short version: progress on the static track has come from adapting to the specific task (search for a fitting program, fine-tune on its examples, or recurse a small model), and the no-internet sandbox rules push eligible work toward compact, self-contained models. Tiny recursive models (TRM/HRM) are a particularly good fit for these constraints — explored in recursive-reasoning-models.

Roadmap

The approach I'm exploring is an ensemble with two branches, both adapted to the ARC-AGI-2 dataset:

  • (a) an LLM branch — a Qwen model with test-time training (TTT); and
  • (b) a TRM branch — a tiny recursive model (see recursive-reasoning-models).

Both fit the no-internet sandbox (small, self-contained), and the two branches are meant to cover each other's failure modes. This is a direction under active exploration, not a finished result.

Planned working material for the next round:

arc-agi-2/
├── docs/approach.md   # the two-branch ensemble — design + rationale
├── notebooks/         # Kaggle notebooks: experimenting with approaches,
│                      #   plus competition submission notebooks
├── experiments/       # supporting experiment code
└── data/              # (gitignored) public tasks fetched, never committed

Competition-rules note for that work: the public ARC-AGI tasks (fchollet/ARC-AGI) are openly licensed — analyse freely, but fetch, don't commit them; never commit private/competition eval data. data/ stays gitignored.

Siblings


Sources

Re-verified 2026-06-20. Re-check before reuse.

Prose and figures in this repo are © 2026 Antonio Rodriguez-Moral, licensed CC BY 4.0; code is MIT.


🌐 arodmor.me · 💻 github.com/arodmor · ✉️ antonio.rodriguez.moral@pm.me

Part of a series: AI/ML Lab · voice-ai-landscape · arc-agi · recursive-reasoning-models

About

The ARC-AGI-2 track of ARC Prize 2026 — static abstract reasoning.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors