Skip to content

botafogo-EPFL/CaLlm_pub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI meets Mathematics Education: Supporting Instructors in Large Mathematics Classes with Context-Aware AI

Abstract :

Large-enrollment university courses face persistent challenges in delivering timely and scalable instructional support. While generative AI holds promise, its educational use requires trust, reliability, and pedagogical alignment. We present a human-centered framework for AI-assisted course support, co-designed and evaluated in partnership with course instructors to foreground instructional oversight. For a large introductory mathematics course, we fine-tune a lightweight, cost-efficient language model on 2,588 historical student–instructor interactions and evaluate it against a new benchmark of 150 representative questions, annotated by five instructors. The model achieves 75.3% accuracy and excels on the medium-difficulty queries that dominate student requests; in 36% of cases, its answers are rated equal to or better than instructors’. Yet, instructors identified consistent cases where oversight is essential, highlighting the risks of unsupervised deployment. Our findings demonstrate how hybrid human–AI systems, when integrated into structured course workflows, can enhance instructional capacity while setting realistic expectations for AI in education.

The already existing course platform contains the course data, exercises, quizzes, and a forum where students can ask questions (see here). The goal is to create a large language model that can answer in real-time questions from the students. Models and datasets can be found here

CaLlm system overview

Figure: Overview of our human-guided research process: (1) collecting and annotating student–instructor Q&A pairs, (2) instructor-led model evaluation and fine-tuning, and (3) expert evaluation with five course instructors through surveys and interviews.

Note: This repository provides the code and tooling to train, evaluate, and deploy models for the CaLlm (Calculus Large Language Model) project. It does not include the raw training data, dataset-generation or translation assets, the evaluation interfaces, the evaluation results (for privacy reasons), or the notebooks used for data analysis. For access to those materials, please contact the project maintainers.

Project objective :

The main objective of this project is to finetune a large language model with strong mathematical reasoning capabilities that can answer students' questions. Concerning hardware limitations, the model should have fewer than 14B parameters and be hosted at the EPFL local cluster.

Tested models :

Model Model Family Context Window
Mathstral-7B-v0.1 mistral 32k
Gemma-2-9b-it gemma 32k
Llama-3.1-8B-Inst llama 128k
Mathstral-7B mistral 32k
Qwen-2.5-Math-7B qwen 4k
OpenR1-Qwen-7B qwen 32k
Gemma-3-12B-Inst gemma 128k
DeepSeek-R1-Distill-14B qwen 128k

Human eval results

Figure: Evaluation of base and finetuned models on 40 test questions by a single expert. The results are shown by the question's difficulty: we have 10 examples of easy questions, 20 medium, and 10 hard ones. Base models: Llama 3.1, Mathstral, Qwen 2.5, Gemma 3, and DeepSeek R1 Distill. Models with ''-QA'' are finetuned on our dataset; ``OpenMath220k'' are finetuned on the respective math dataset. Scores shown are averages. 95% confidence intervals are computed via non-parametric bootstrap with 10000 resamples.

Data :

Please refer to the Data Readme for more information about the data and datasets used in this project.

Model statistics

Daily model statistics and usage can be found here.

Installation and project usage:

There are 3 main independent tools provided in this release to install and use the project, depending on your needs :

  • Finetuning
  • Automated evaluation
  • Local inference for fast prototyping and testing
  • Inference server for broader usage

This project was developed and tested on a Windows system, and models were trained and evaluated on EPFL's Kuma cluster.

In any case, please follow these first steps to clone the repository and install the necessary dependencies (for local usage):

git clone git@github.com:botafogo-EPFL/CaLlm_pub.git
conda create -n callm python=3.11.9
conda activate callm
pip install -r requirements.txt

Once this is done, go to the corresponding Readme files for each tool to get more information about how to use them :

If you want to install the project on EPFL's clusters (Kuma or Izar is recommended), you can follow the steps that are provided here. This will install the project in a virtual environment and set up the necessary libraries and dependencies.

Involved persons :

  • Jeremy Barghorn, Data Science Master student at EPFL, project's code developer and maintainer, jeremy.barghorn@epfl.ch

  • Prof. Sacha Friedli, EPFL, project initiator and supervisor, in charge of human annotation and the course platform, sacha.friedli@epfl.ch

  • Postdoctoral Researcher Anna Sotnikova, Natural Language Processing Lab EPFL, in charge of the technical and scientific supervision of the project, anna.sotnikova@epfl.ch

About

This repository contains the code to reproduce the training and deployment of the Calculus LLM used on the Botafogo website

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors