This directory contains notebooks for:
- Exploring the datasets used for MCQA (Multiple-Choice Question Answering)
- Inspecting the base models
- Analyzing the logit outputs of likelihood-based MCQA models
Use this for diagnostic and interpretability tasks to understand data-model interactions.
Scripts for generating MCQA datasets and instruction fine-tuning data.
⚠️ Hugging Face login is required. Run:huggingface-cli login
-
create_dataset_IF_Mixture.py
Creates the instruction-finetuned dataset.
Use the--hub-dataset-nameargument to specify your Hugging Face repo ID. -
create_m3_mcqa_dataset.py
Builds the M3-style MCQA dataset.
Also supports--hub-dataset-namefor pushing to the Hub. -
create_dataset_mcqa.py
Another variant for MCQA dataset creation.
Also supports--hub-dataset-namefor pushing to the Hub.
This script accepts a--configargument to specify a dataset configuration YAML file:MCQA_datasets_m2.yamlMCQA_datasets_m3.yaml
Both files are located in theconfig/directory withingenerate_data/.
-
generate_mcqa_10_choices.py
Generates MCQA-style questions with 10 answer choices.
Requires installation of a local.whlpackage:pip install artifacts/gpt_wrapper-0.2.0-py3-none-any.whl
You also need to place a file named
secrets_env.pyin the same directory as this script.
This file must define the following environment variables:PARROT_API_BASEPARROT_API_KEY
-
create_mcqa_from_generated_10_options.pyConverts generated data into a valid MCQA dataset.- A pre-existing JSON file generated by generate_mcqa_10_choices.py
- A Hugging Face repo ID
python create_mcqa_from_generated_10_options.py --repo-id <your_repo_id>
All of this code used hydra
To train the MCQA model using the 10-option format, you have two options:
⚠️ Hugging Face login is required. Run:huggingface-cli login wandb login
You must also provide a Hugging Face Hub repo ID when training:
python finetune_mcqa_10_options_m3.py model.hub_model_id=path/to/repoYou can disable both Hugging Face and Weights & Biases integration using:
WANDB_MODE=disabled python finetune_mcqa_10_options_m3.py training.push_to_hub=falseThis will:
- Prevent pushing models to the Hugging Face Hub
- Disable wandb logging
This repository contains multiple training scripts for different model configurations and training approaches:
finetune_mcqa_10_options_m3.py - Trains the M3 model using the generated 10-option MCQA dataset.
- Starts from base model
- Uses maximum likelihood training with any number of choices
- Incorporates synthetic 10-choice MMLU auxiliary training dataset
- Uses random templates for robustness to different prompts
- Model uploaded to M3 MNLP
Example (offline mode):
python finetune_mcqa_10_options_m3.py training.push_to_hub=falsefinetune_mcqa_10_options_m3.py - Model uploaded to M3 MNLP.
- Starts from base model
- Uses maximum likelihood training with any number of choices
- Incorporates synthetic 10-choice MMLU auxiliary training dataset
- Uses random templates for robustness to different prompts
finetune_mcqa.py - MCQA model with 4-option likelihood training starting from base model.
- Model uploaded to M2 MNLP
finetune_mcqa_10_options.py - Similar to finetune_mcqa_10_options_m3.py but without random templates.
finetune_IF_mcqa.py - MCQA model with 4-option likelihood starting from the instruction fine-tuned language modeling model.
finetune_lm.py - Instruction fine-tuned model using language modeling loss.
finetune_cl.py - Instruction fine-tuned model using completion-only loss.
finetune_base_mcqa_text.py - Seq2seq model trained to output [letter]. [text of the answer].
- Starts from base model
- Uses random templates
finetune_IF_cl_mcqa_text.py - Seq2seq model trained to output [letter]. [text of the answer].
- Starts from instruction fine-tuned completion-only loss model
- Uses random templates
finetune_base_text_mcqa_rl.py - Model trained with RLVR using GRPO.
- Starts from
finetune_base_mcqa_text.py - Reward: +1 or -1 when
[letter.]is found
finetune_IF_cl_mcqa_text_rl.py - Model trained with RLVR using GRPO.
- Starts from
finetune_IF_cl_mcqa_text.py - Reward structure:
- +2 when
[letter]. [text of the answer]is found - +1 when
[letter.]is found - -1 when neither is found
- +2 when
Evaluation Framework: lighteval-epfl-mnlp
- This forked repository was used for evaluating MCQA and Chain-of-Thought (CoT) models
The links of the datasets are
- https://huggingface.co/datasets/andresnowak/MNLP_M3_mcqa_dataset
- https://huggingface.co/datasets/andresnowak/MNLP_M2_mcqa_dataset
- https://huggingface.co/datasets/andresnowak/MNLP_MCQA_dataset
- https://huggingface.co/datasets/andresnowak/MNLP_MCQA_dataset_2
- https://huggingface.co/datasets/andresnowak/mmlu-auxiliary-train-10-choices
The links of models are
- https://huggingface.co/andresnowak/Qwen3-0.6B-instruction-finetuned
- https://huggingface.co/andresnowak/Qwen3-0.6B-instruction-finetuned_v2
- https://huggingface.co/andresnowak/MNLP_M2_mcqa_model
- https://huggingface.co/andresnowak/Qwen3-0.6B-MNLP_mcqa_model_text
- https://huggingface.co/andresnowak/Qwen3-0.6B-instruction-finetuned-MCQA
- https://huggingface.co/andresnowak/Qwen3-0.6B-MNLP_mcqa_rl
- https://huggingface.co/andresnowak/Qwen3-0.6B-MNLP_mcqa_model_text_2
- https://huggingface.co/andresnowak/Qwen3-0.6B-CoT
- https://huggingface.co/andresnowak/Qwen3-0.6B-MNLP_IF_v2_text_mcqa_rl