CSQA Explanation Generation Fine-tuning (T5 + LoRA)

This project aims to train a T5 based model to perform common-sense multiple-choice reasoning and generate human-like explanations.
We use the Common Sense QA dataset, which provides:

A question
Multiple-choice answer options
The correct answer
A human-written rationale (explanation)

Project Structure

.
├── csqa_explanation_generation.py
│   # Main script for generating high-quality explanations for CSQA
│   # using a large language model (Here we use Qwen2.5 7B)

├── training_t5_CoSE.ipynb
│   # Jupyter notebook for training and fine-tuning T5 models
│   # (T5-Small baseline and T5-Large + LoRA)
│   # Includes:
│   # - data preprocessing & cleaning
│   # - tokenization
│   # - LoRA configuration
│   # - training & evaluation
│   # - baseline vs fine-tuned comparison

├── data/
│   ├── csqa_full.jsonl
│   │   # Cleaned CommonsenseQA-style dataset with
│   │   # question, choices, answer, and generated explanations
│   │   # (JSONL format, one example per line)
│   │
│   └── csqa_full.csv
│       # CSV version of the same dataset for analysis,
│       # visualization, or third-party tools

├── t5_csqa_lora/
│   # Output directory for LoRA fine-tuning checkpoints
│   ├── checkpoint-*/                # Intermediate training checkpoints
│   │   ├── adapter_config.json
│   │   ├── adapter_model.safetensors
│   │   └── trainer_state.json
│   │
│   └── trainer_state.json            # Final trainer metadata

├── t5_csqa_lora_merged/
│   # Final merged T5 model (base T5 + LoRA weights)
│   # Fully loadable for inference
│   ├── config.json
│   ├── generation_config.json
│   ├── model.safetensors
│   ├── tokenizer.json
│   ├── tokenizer_config.json
│   └── special_tokens_map.json

└── README.md
    # Project overview, methodology, and results

Setup Instruction

Follow the steps below to set up the environment and run the project.

1️⃣ Clone the repository

git clone https://github.com/kkli08/common-sense-reasoning.git
cd <your-repo-name>

2️⃣ Create and activate a virtual environment

We recommend using a Python virtual environment to avoid dependency conflicts.

macOS / Linux

python3 -m venv venv
source venv/bin/activate

Windows

python -m venv venv
venv\Scripts\activate

3️⃣ Install dependencies

All required packages are listed in requirements.txt.

pip install --upgrade pip
pip install -r requirements.txt

✅ Note (macOS / Apple Silicon): Training is supported via Metal Performance Shaders (MPS). No CUDA or bitsandbytes is required.

4️⃣ Verify installation (optional)

You can quickly verify that PyTorch and Transformers are installed correctly:

python - <<EOF
import torch
import transformers
print("Torch version:", torch.__version__)
print("Transformers version:", transformers.__version__)
print("MPS available:", torch.backends.mps.is_available())
EOF

Usage Instruction

Dataset preparation

This project includes a data generation pipeline that creates high-quality short reasoning explanations for the CommonsenseQA dataset using a locally deployed large language model. Why not CoSE?

The generation process uses a two-stage reasoning approach:

Generate a correct long explanation for each question
Compress it into a clean, single-sentence short explanation

Only the final short explanations are kept and saved for downstream model training.

Run the following command:

python3 csqa_explanation_generation.py

This will:

Load the CommonsenseQA training set
Generate explanations incrementally with progress tracking
Automatically resume from partially generated files
Save results to:
- data/csqa_full.jsonl
data/csqa_full.csv

Each generated sample follows this format:

{
    "question": "...",
    "choices": ["...", "...", "...", "...", "..."],
    "answer": "B",
    "short_explanation": "Because ..."
}

The dataset is expected to be located in:

data/
├── csqa_full.jsonl
└── csqa_full.csv

Model training

To train and evaluate the model:

training_t5_CoSE.ipynb

The notebook covers:

Data cleaning and formatting
Tokenization for T5
LoRA fine-tuning
Baseline vs fine-tuned evaluation
Model merging and inference

Run the notebook sequentially to reproduce results.

Inference with the trained model

After training, the merged model will be saved under:

t5_csqa_lora_merged/

You can load it directly using Hugging Face Transformers for inference.

Sample output

TBD

CSQA Fine-Tuning Evaluation

T5-Small (No Fine-Tuning) vs T5-Large + LoRA (Fine-Tuned)

We compare both models on the same 8 baseline CSQA questions.

Model	Accuracy	Correct / Total
T5-Small (Baseline)	0%	0 / 8
T5-Large + LoRA Fine-Tuned	100%	8 / 8

Fine-tuning transformed model performance from completely failing → perfect accuracy.

Side-by-Side Comparison of Model Outputs

Below, each question shows outputs from both models, allowing direct comparison.

Question 1

Q: Sammy wanted to go to where the people were. Where might he go?

Choices: race track, populated areas, the desert, apartment, roadblock

Gold: B

Model	Output
T5-Small	answer: E. Because Sammy wanted to go to where the people were. ❌ Predicted: E
T5-Large + LoRA	answer: B. Because "populated areas" best fits Sammy's desire to visit populated areas. ✔ Correct

Question 2

Q: Where do you store fresh vegetables?

Choices: garage, refrigerator, bookshelf, bathroom, attic

Gold: B

Model	Output
T5-Small	answer: E. Because refrigerators are used to store fresh vegetables. ❌ Predicted: E
T5-Large + LoRA	answer: B. Because refrigerators are ideal for storing fresh vegetables. ✔ Correct

Question 3

Q: If you heat water to 100°C, what will happen?

Choices: freeze, boil, rust, glow, evaporate

Gold: B

Model	Output
T5-Small	answer: E. Because heat water to 100°C will freeze and rust. ❌ Predicted: E
T5-Large + LoRA	answer: B. Because boiling occurs when water reaches 100°C. ✔ Correct

Question 4

Q: What do people usually use to dry their hands after washing?

Choices: towel, hammer, blanket, pillow, shoe

Gold: A

Model	Output
T5-Small	answer: E. Because a towel is used to dry hands after washing. ❌ Predicted: E
T5-Large + LoRA	answer: A. Because a towel is commonly used to dry hands after washing. ✔ Correct

Question 5

Q: Where would you typically find books to read?

Choices: library, swimming pool, factory, garage, freeway

Gold: A

Model	Output
T5-Small	answer: D. Because libraries are the most popular places to read books. ❌ Predicted: D
T5-Large + LoRA	answer: A. Because libraries offer a wide variety of books for reading, making them the best choice. ✔ Correct

Question 6

Q: If someone wants to relax and reduce stress, what might they do?

Choices: meditate, argue, shout, work more, run into danger

Gold: A

Model	Output
T5-Small	answer: D. Because "work more" is the best way to relax and reduce stress. ❌ Predicted: D
T5-Large + LoRA	answer: A. Because meditation is the most effective way to relax and reduce stress. ✔ Correct

Question 7

Q: What tool is commonly used to tighten screws?

Choices: screwdriver, spoon, pencil, comb, fork

Gold: A

Model	Output
T5-Small	answer: D. Because a fork is commonly used to tighten screws. ❌ Predicted: D
T5-Large + LoRA	answer: A. Because a screwdriver is the most common tool for tightening screws. ✔ Correct

Question 8

Q: Where would you likely find many wild animals living together?

Choices: forest, kitchen, bathroom, rooftop, office

Gold: A

Model	Output
T5-Small	answer: D. Because "office" is the most popular place for wild animals living together. ❌ Predicted: D
T5-Large + LoRA	answer: A. Because forests are ideal habitats for wild animals to live together. ✔ Correct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSQA Explanation Generation Fine-tuning (T5 + LoRA)

Project Structure

Setup Instruction

1️⃣ Clone the repository

2️⃣ Create and activate a virtual environment

macOS / Linux

Windows

3️⃣ Install dependencies

4️⃣ Verify installation (optional)

Usage Instruction

Dataset preparation

Model training

Inference with the trained model

Sample output

CSQA Fine-Tuning Evaluation

T5-Small (No Fine-Tuning) vs T5-Large + LoRA (Fine-Tuned)

Side-by-Side Comparison of Model Outputs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
.gitignore		.gitignore
README.md		README.md
csqa_explanation_generation.py		csqa_explanation_generation.py
requirements.txt		requirements.txt
training_t5_CoSE.ipynb		training_t5_CoSE.ipynb

Folders and files

Latest commit

History

Repository files navigation

CSQA Explanation Generation Fine-tuning (T5 + LoRA)

Project Structure

Setup Instruction

1️⃣ Clone the repository

2️⃣ Create and activate a virtual environment

macOS / Linux

Windows

3️⃣ Install dependencies

4️⃣ Verify installation (optional)

Usage Instruction

Dataset preparation

Model training

Inference with the trained model

Sample output

CSQA Fine-Tuning Evaluation

T5-Small (No Fine-Tuning) vs T5-Large + LoRA (Fine-Tuned)

Side-by-Side Comparison of Model Outputs

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages