Skip to content

andresnowak/mnlp_mcqa_model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

183 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Link to paper

Paper

Project Structure Overview

dataset_and_model_inspection/

This directory contains notebooks for:

  • Exploring the datasets used for MCQA (Multiple-Choice Question Answering)
  • Inspecting the base models
  • Analyzing the logit outputs of likelihood-based MCQA models

Use this for diagnostic and interpretability tasks to understand data-model interactions.


generate_data/

Scripts for generating MCQA datasets and instruction fine-tuning data.

⚠️ Hugging Face login is required. Run:

huggingface-cli login

Key Files

  • create_dataset_IF_Mixture.py
    Creates the instruction-finetuned dataset.
    Use the --hub-dataset-name argument to specify your Hugging Face repo ID.

  • create_m3_mcqa_dataset.py
    Builds the M3-style MCQA dataset.
    Also supports --hub-dataset-name for pushing to the Hub.

  • create_dataset_mcqa.py
    Another variant for MCQA dataset creation.
    Also supports --hub-dataset-name for pushing to the Hub.
    This script accepts a --config argument to specify a dataset configuration YAML file:

    • MCQA_datasets_m2.yaml
    • MCQA_datasets_m3.yaml
      Both files are located in the config/ directory within generate_data/.
  • generate_mcqa_10_choices.py
    Generates MCQA-style questions with 10 answer choices.
    Requires installation of a local .whl package:

    pip install artifacts/gpt_wrapper-0.2.0-py3-none-any.whl

    You also need to place a file named secrets_env.py in the same directory as this script.
    This file must define the following environment variables:

    • PARROT_API_BASE
    • PARROT_API_KEY
  • create_mcqa_from_generated_10_options.py Converts generated data into a valid MCQA dataset.

    • A pre-existing JSON file generated by generate_mcqa_10_choices.py
    • A Hugging Face repo ID
    python create_mcqa_from_generated_10_options.py --repo-id <your_repo_id>

Training

All of this code used hydra

To train the MCQA model using the 10-option format, you have two options:


Option 1: Login to Weights & Biases and Hugging Face

⚠️ Hugging Face login is required. Run:

huggingface-cli login
wandb login

You must also provide a Hugging Face Hub repo ID when training:

python finetune_mcqa_10_options_m3.py model.hub_model_id=path/to/repo

Option 2: Run Offline (No Login Required)

You can disable both Hugging Face and Weights & Biases integration using:

WANDB_MODE=disabled python finetune_mcqa_10_options_m3.py training.push_to_hub=false

This will:

  • Prevent pushing models to the Hugging Face Hub
  • Disable wandb logging

Training Scripts

This repository contains multiple training scripts for different model configurations and training approaches:

Base Model Training

finetune_mcqa_10_options_m3.py - Trains the M3 model using the generated 10-option MCQA dataset.

  • Starts from base model
  • Uses maximum likelihood training with any number of choices
  • Incorporates synthetic 10-choice MMLU auxiliary training dataset
  • Uses random templates for robustness to different prompts
  • Model uploaded to M3 MNLP

Example (offline mode):

python finetune_mcqa_10_options_m3.py training.push_to_hub=false

MCQA likelihood models

finetune_mcqa_10_options_m3.py - Model uploaded to M3 MNLP.

  • Starts from base model
  • Uses maximum likelihood training with any number of choices
  • Incorporates synthetic 10-choice MMLU auxiliary training dataset
  • Uses random templates for robustness to different prompts

finetune_mcqa.py - MCQA model with 4-option likelihood training starting from base model.

  • Model uploaded to M2 MNLP

finetune_mcqa_10_options.py - Similar to finetune_mcqa_10_options_m3.py but without random templates.

finetune_IF_mcqa.py - MCQA model with 4-option likelihood starting from the instruction fine-tuned language modeling model.

Instruction Fine-tuned Models

finetune_lm.py - Instruction fine-tuned model using language modeling loss.

finetune_cl.py - Instruction fine-tuned model using completion-only loss.

Sequence-to-Sequence Models

finetune_base_mcqa_text.py - Seq2seq model trained to output [letter]. [text of the answer].

  • Starts from base model
  • Uses random templates

finetune_IF_cl_mcqa_text.py - Seq2seq model trained to output [letter]. [text of the answer].

  • Starts from instruction fine-tuned completion-only loss model
  • Uses random templates

Reinforcement Learning Models

finetune_base_text_mcqa_rl.py - Model trained with RLVR using GRPO.

  • Starts from finetune_base_mcqa_text.py
  • Reward: +1 or -1 when [letter.] is found

finetune_IF_cl_mcqa_text_rl.py - Model trained with RLVR using GRPO.

  • Starts from finetune_IF_cl_mcqa_text.py
  • Reward structure:
    • +2 when [letter]. [text of the answer] is found
    • +1 when [letter.] is found
    • -1 when neither is found

Other Libraries

Evaluation Framework: lighteval-epfl-mnlp

  • This forked repository was used for evaluating MCQA and Chain-of-Thought (CoT) models

Links

The links of the datasets are

The links of models are

About

Finetuning of Qwen3 0.6B for MCQA tasks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages