Skip to content

jasujanish/701_final

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OCaml Code LLM

Overview

This project implements an efficient training approach for fine-tuning Large Language Models (LLMs) on OCaml code generation tasks. The system combines data pruning techniques with parameter-efficient fine-tuning and reinforcement learning to optimize model performance while minimizing computational resources.

Architecture

The pipeline consists of four main components:

  1. Data Cleaning and Pruning (data)
  2. Fine-Tuning (sft)
  3. Reinforcement Learning (rl)
  4. Evaluation (eval)

Component Details

1. data

Purpose: Prune the OCaml dataset.

Summary: This part of the pipeline takes in the synthetically-created OCaml dataset developed in the "Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs" paper Then, it performs two forms of data efficient pruning, one based on kernel density estimations and the other baised on pairwise pruning, as specified in the Code Less, Align More: Efficient LLM Fine-tuning for Code Generation with Data Pruning paper. Data is split into two halves. he first half is used for parameter-efficient fine-tuning on general OCaml tasks. The second half is used for RL. Each half is split into training and validation sets.

Output: A pruned, high-quality OCaml dataset optimized for efficient training


2. sft

Purpose: Orchestrates the first stage of training (fine-tuning).

Summary: This folder contains training_stage_1_sft.py and sft_merge_hf.py. The former file is a complete workflow to apply parameter-efficient fine-tuning to a model and training logs are saved in sft/sft_training_logs.txt. The latter Python file takes the output of training_stage_1_sft.py, merges with the original model used for fine-tuning, and pushes the complete model to hugging face.

Output: A model fine-tuned for OCaml code generation and training logs for that model.


3. rl

Purpose: Orchestrates the second stage of training (direct policy optimization).

Summary: This folder contains all files used to apply the second stage of training. build_dpo_dataset.py and build_dpo_valset.py are used to develop the datasets used in direct preference optimization. dpo_train.py conducts the efficient training of the model (and results are saved in rl_training_logs.txt). Finally, rl_merge_hf.py takes the output of dpo_train.py, merges with the original model used for fine-tuning, and pushes the complete model to hugging face.

Output: A model fully optimized for OCaml code generation and training logs for that model.

4. eval

Purpose: Used to evaluate the models created in steps 2 and 3.

Summary: This file consists of functions to evaluate the model's performance.


Key Features

  • Data Efficiency: Reduces dataset size and improves data quality through intelligent pruning
  • Two-Stage Training: Combines general fine-tuning with task-specific reinforcement learning
  • Parameter Efficiency: Uses PEFT techniques to minimize computational requirements
  • Specialization: Focuses on practical OCaml code completion tasks

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors