This project implements an efficient training approach for fine-tuning Large Language Models (LLMs) on OCaml code generation tasks. The system combines data pruning techniques with parameter-efficient fine-tuning and reinforcement learning to optimize model performance while minimizing computational resources.
The pipeline consists of four main components:
- Data Cleaning and Pruning (
data) - Fine-Tuning (
sft) - Reinforcement Learning (
rl) - Evaluation (
eval)
Purpose: Prune the OCaml dataset.
Summary: This part of the pipeline takes in the synthetically-created OCaml dataset developed in the "Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs" paper Then, it performs two forms of data efficient pruning, one based on kernel density estimations and the other baised on pairwise pruning, as specified in the Code Less, Align More: Efficient LLM Fine-tuning for Code Generation with Data Pruning paper. Data is split into two halves. he first half is used for parameter-efficient fine-tuning on general OCaml tasks. The second half is used for RL. Each half is split into training and validation sets.
Output: A pruned, high-quality OCaml dataset optimized for efficient training
Purpose: Orchestrates the first stage of training (fine-tuning).
Summary: This folder contains training_stage_1_sft.py and sft_merge_hf.py. The former file is a complete workflow to apply parameter-efficient fine-tuning to a model and training logs are saved in sft/sft_training_logs.txt. The latter Python file takes the output of training_stage_1_sft.py, merges with the original model used for fine-tuning, and pushes the complete model to hugging face.
Output: A model fine-tuned for OCaml code generation and training logs for that model.
Purpose: Orchestrates the second stage of training (direct policy optimization).
Summary: This folder contains all files used to apply the second stage of training. build_dpo_dataset.py and build_dpo_valset.py are used to develop the datasets used in direct preference optimization. dpo_train.py conducts the efficient training of the model (and results are saved in rl_training_logs.txt). Finally, rl_merge_hf.py takes the output of dpo_train.py, merges with the original model used for fine-tuning, and pushes the complete model to hugging face.
Output: A model fully optimized for OCaml code generation and training logs for that model.
Purpose: Used to evaluate the models created in steps 2 and 3.
Summary: This file consists of functions to evaluate the model's performance.
- Data Efficiency: Reduces dataset size and improves data quality through intelligent pruning
- Two-Stage Training: Combines general fine-tuning with task-specific reinforcement learning
- Parameter Efficiency: Uses PEFT techniques to minimize computational requirements
- Specialization: Focuses on practical OCaml code completion tasks