GitHub - andresnowak/mnlp_mcqa_model: Finetuning of Qwen3 0.6B for MCQA tasks

Link to paper

Paper

Project Structure Overview

`dataset_and_model_inspection/`

This directory contains notebooks for:

Exploring the datasets used for MCQA (Multiple-Choice Question Answering)
Inspecting the base models
Analyzing the logit outputs of likelihood-based MCQA models

Use this for diagnostic and interpretability tasks to understand data-model interactions.

`generate_data/`

Scripts for generating MCQA datasets and instruction fine-tuning data.

⚠️ Hugging Face login is required. Run:
huggingface-cli login

Key Files

create_dataset_IF_Mixture.py
Creates the instruction-finetuned dataset.
Use the --hub-dataset-name argument to specify your Hugging Face repo ID.
create_m3_mcqa_dataset.py
Builds the M3-style MCQA dataset.
Also supports --hub-dataset-name for pushing to the Hub.
create_dataset_mcqa.py
Another variant for MCQA dataset creation.
Also supports --hub-dataset-name for pushing to the Hub.
This script accepts a --config argument to specify a dataset configuration YAML file:
- MCQA_datasets_m2.yaml
- MCQA_datasets_m3.yaml
  Both files are located in the config/ directory within generate_data/.
generate_mcqa_10_choices.py
Generates MCQA-style questions with 10 answer choices.
Requires installation of a local .whl package:
```
pip install artifacts/gpt_wrapper-0.2.0-py3-none-any.whl
```
You also need to place a file named secrets_env.py in the same directory as this script.
This file must define the following environment variables:
- PARROT_API_BASE
- PARROT_API_KEY
create_mcqa_from_generated_10_options.py Converts generated data into a valid MCQA dataset.
- A pre-existing JSON file generated by generate_mcqa_10_choices.py
- A Hugging Face repo ID
```
python create_mcqa_from_generated_10_options.py --repo-id <your_repo_id>
```

Training

All of this code used hydra

To train the MCQA model using the 10-option format, you have two options:

Option 1: Login to Weights & Biases and Hugging Face

⚠️ Hugging Face login is required. Run:
huggingface-cli login
wandb login

You must also provide a Hugging Face Hub repo ID when training:

python finetune_mcqa_10_options_m3.py model.hub_model_id=path/to/repo

Option 2: Run Offline (No Login Required)

You can disable both Hugging Face and Weights & Biases integration using:

WANDB_MODE=disabled python finetune_mcqa_10_options_m3.py training.push_to_hub=false

This will:

Prevent pushing models to the Hugging Face Hub
Disable wandb logging

Training Scripts

This repository contains multiple training scripts for different model configurations and training approaches:

Base Model Training

finetune_mcqa_10_options_m3.py - Trains the M3 model using the generated 10-option MCQA dataset.

Starts from base model
Uses maximum likelihood training with any number of choices
Incorporates synthetic 10-choice MMLU auxiliary training dataset
Uses random templates for robustness to different prompts
Model uploaded to M3 MNLP

Example (offline mode):

python finetune_mcqa_10_options_m3.py training.push_to_hub=false

MCQA likelihood models

finetune_mcqa_10_options_m3.py - Model uploaded to M3 MNLP.

Starts from base model
Uses maximum likelihood training with any number of choices
Incorporates synthetic 10-choice MMLU auxiliary training dataset
Uses random templates for robustness to different prompts

finetune_mcqa.py - MCQA model with 4-option likelihood training starting from base model.

Model uploaded to M2 MNLP

finetune_mcqa_10_options.py - Similar to finetune_mcqa_10_options_m3.py but without random templates.

finetune_IF_mcqa.py - MCQA model with 4-option likelihood starting from the instruction fine-tuned language modeling model.

Instruction Fine-tuned Models

finetune_lm.py - Instruction fine-tuned model using language modeling loss.

finetune_cl.py - Instruction fine-tuned model using completion-only loss.

Sequence-to-Sequence Models

finetune_base_mcqa_text.py - Seq2seq model trained to output [letter]. [text of the answer].

Starts from base model
Uses random templates

finetune_IF_cl_mcqa_text.py - Seq2seq model trained to output [letter]. [text of the answer].

Starts from instruction fine-tuned completion-only loss model
Uses random templates

Reinforcement Learning Models

finetune_base_text_mcqa_rl.py - Model trained with RLVR using GRPO.

Starts from finetune_base_mcqa_text.py
Reward: +1 or -1 when [letter.] is found

finetune_IF_cl_mcqa_text_rl.py - Model trained with RLVR using GRPO.

Starts from finetune_IF_cl_mcqa_text.py
Reward structure:
- +2 when [letter]. [text of the answer] is found
- +1 when [letter.] is found
- -1 when neither is found

Other Libraries

Evaluation Framework: lighteval-epfl-mnlp

This forked repository was used for evaluating MCQA and Chain-of-Thought (CoT) models

Links

The links of the datasets are

The links of models are

Name		Name	Last commit message	Last commit date
Latest commit History 183 Commits
config		config
dataset_and_model_inspection		dataset_and_model_inspection
generate_data		generate_data
src		src
templates		templates
.gitattributes		.gitattributes
.gitignore		.gitignore
Are_You_Smarter_Than_a_Compact_AI_.pdf		Are_You_Smarter_Than_a_Compact_AI_.pdf
Notes.md		Notes.md
README.md		README.md
Tasks.md		Tasks.md
finetune_IF_cl_mcqa_text.py		finetune_IF_cl_mcqa_text.py
finetune_IF_cl_text_mcqa_rl.py		finetune_IF_cl_text_mcqa_rl.py
finetune_IF_mcqa.py		finetune_IF_mcqa.py
finetune_base_mcqa_text.py		finetune_base_mcqa_text.py
finetune_base_text_mcqa_rl.py		finetune_base_text_mcqa_rl.py
finetune_cl.py		finetune_cl.py
finetune_cot.py		finetune_cot.py
finetune_lm.py		finetune_lm.py
finetune_mcqa.py		finetune_mcqa.py
finetune_mcqa_10_options.py		finetune_mcqa_10_options.py
finetune_mcqa_10_options_m3.py		finetune_mcqa_10_options_m3.py
finetune_mcqa_likelihood_rl.py		finetune_mcqa_likelihood_rl.py
finetune_mcqa_rl_2.py		finetune_mcqa_rl_2.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test_model.py		test_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Link to paper

Project Structure Overview

`dataset_and_model_inspection/`

`generate_data/`

Key Files

Training

Option 1: Login to Weights & Biases and Hugging Face

Option 2: Run Offline (No Login Required)

Training Scripts

Base Model Training

MCQA likelihood models

Instruction Fine-tuned Models

Sequence-to-Sequence Models

Reinforcement Learning Models

Other Libraries

Links

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Link to paper

Project Structure Overview

dataset_and_model_inspection/

generate_data/

Key Files

Training

Option 1: Login to Weights & Biases and Hugging Face

Option 2: Run Offline (No Login Required)

Training Scripts

Base Model Training

MCQA likelihood models

Instruction Fine-tuned Models

Sequence-to-Sequence Models

Reinforcement Learning Models

Other Libraries

Links

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`dataset_and_model_inspection/`

`generate_data/`

Packages