Become a sponsor to LongC
Motivation
Reading the ITBench paper was genuinely exciting for me. It immediately connected with my background in quantitative modelling, AI evaluation, and failure-mode analysis, and raised many questions I wanted to explore more deeply.
Goal
My goal is to study the paper in depth—understanding its assumptions, evaluation design, data sources, and limitations—and to build open, reproducible implementations that allow these ideas to be tested, extended, and better understood in practice.
Focus Areas
This work focuses on:
- Deep reading and validation of the ITBench methodology and sources
- Reproducing core evaluation setups using open-source models
- Examining evaluation design choices, including why specific metrics and tasks were selected
- Investigating robustness, edge cases, and failure modes
- Identifying gaps in current AI evaluation and exploring how they could be improved
Perspective
I am particularly interested in the evaluation needs highlighted by the paper, and in learning how benchmarks can better reflect real-world reasoning, risk, and decision-making, rather than surface-level performance alone.
Outputs & Support
All outputs will be shared openly as notebooks, code, and clear documentation.
Sponsorship supports the time required for careful reading, experimentation, validation, and communication of insights.
Featured work
-
longchung90/weather_forecast
Originally developed as a Coursera assignment and expanded into a fully featured global weather application with enhanced UX, automation, and scalable city management.
JavaScript 1 -
Jupyter Notebook 1
-
Jupyter Notebook
-
longchung90/Portfolio_Project
The portfolio began as a course project from the IBM Coursera program. By integrating AI-augmented workflows and JavaScript-driven interactivity, it evolved into a lively, dynamic webpage that goes…
HTML 1 -
Jupyter Notebook 1
-
Jupyter Notebook 1
$5 a month
Select🟢 Tier 1 — AU$5 / month · Supporter
- Support open learning and exploration in AI evaluation
- Helps sustain careful study of ITBench, reproducible notebooks, and clear documentation shared openly with the community
$15 a month
Select🔵 Tier 2 — AU$15 / month · Contributor
- Support hands-on reproduction of AI evaluation benchmarks
- Enables deeper work on reproducing ITBench setups, analysing metrics, robustness, and failure modes, and publishing well-documented notebooks
$50 a month
Select🟣 Tier 3 — AU$50 / month · Research Supporter
- Support sustained, research-grade work in open AI evaluation
- Extend benchmark analysis, robustness studies, and thoughtful documentation, translating ITBench insights into practical, real-world evaluation guidance