Become a sponsor to LongC

Motivation

Reading the ITBench paper was genuinely exciting for me. It immediately connected with my background in quantitative modelling, AI evaluation, and failure-mode analysis, and raised many questions I wanted to explore more deeply.

Goal

My goal is to study the paper in depth—understanding its assumptions, evaluation design, data sources, and limitations—and to build open, reproducible implementations that allow these ideas to be tested, extended, and better understood in practice.

Focus Areas

This work focuses on:

Deep reading and validation of the ITBench methodology and sources
Reproducing core evaluation setups using open-source models
Examining evaluation design choices, including why specific metrics and tasks were selected
Investigating robustness, edge cases, and failure modes
Identifying gaps in current AI evaluation and exploring how they could be improved

Perspective

I am particularly interested in the evaluation needs highlighted by the paper, and in learning how benchmarks can better reflect real-world reasoning, risk, and decision-making, rather than surface-level performance alone.

Outputs & Support

All outputs will be shared openly as notebooks, code, and clear documentation.
Sponsorship supports the time required for careful reading, experimentation, validation, and communication of insights.

You'll receive any rewards listed in the $5 monthly tier. Additionally, a Public Sponsor achievement will be added to your profile.

🟢 Tier 1 — AU$5 / month · Supporter

Support open learning and exploration in AI evaluation
Helps sustain careful study of ITBench, reproducible notebooks, and clear documentation shared openly with the community

🔵 Tier 2 — AU$15 / month · Contributor

Support hands-on reproduction of AI evaluation benchmarks
Enables deeper work on reproducing ITBench setups, analysing metrics, robustness, and failure modes, and publishing well-documented notebooks

🟣 Tier 3 — AU$50 / month · Research Supporter

Support sustained, research-grade work in open AI evaluation
Extend benchmark analysis, robustness studies, and thoughtful documentation, translating ITBench insights into practical, real-world evaluation guidance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Become a sponsor to LongC

LongC

Motivation

Goal

Focus Areas

Perspective

Outputs & Support

Select a tier

$ a month

$5 a month

$15 a month

$50 a month