OCaml Code LLM

Overview

This project implements an efficient training approach for fine-tuning Large Language Models (LLMs) on OCaml code generation tasks. The system combines data pruning techniques with parameter-efficient fine-tuning and reinforcement learning to optimize model performance while minimizing computational resources.

Architecture

The pipeline consists of four main components:

Data Cleaning and Pruning (data)
Fine-Tuning (sft)
Reinforcement Learning (rl)
Evaluation (eval)

Component Details

1. `data`

Purpose: Prune the OCaml dataset.

Summary: This part of the pipeline takes in the synthetically-created OCaml dataset developed in the "Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs" paper Then, it performs two forms of data efficient pruning, one based on kernel density estimations and the other baised on pairwise pruning, as specified in the Code Less, Align More: Efficient LLM Fine-tuning for Code Generation with Data Pruning paper. Data is split into two halves. he first half is used for parameter-efficient fine-tuning on general OCaml tasks. The second half is used for RL. Each half is split into training and validation sets.

Output: A pruned, high-quality OCaml dataset optimized for efficient training

2. `sft`

Purpose: Orchestrates the first stage of training (fine-tuning).

Summary: This folder contains training_stage_1_sft.py and sft_merge_hf.py. The former file is a complete workflow to apply parameter-efficient fine-tuning to a model and training logs are saved in sft/sft_training_logs.txt. The latter Python file takes the output of training_stage_1_sft.py, merges with the original model used for fine-tuning, and pushes the complete model to hugging face.

Output: A model fine-tuned for OCaml code generation and training logs for that model.

3. `rl`

Purpose: Orchestrates the second stage of training (direct policy optimization).

Summary: This folder contains all files used to apply the second stage of training. build_dpo_dataset.py and build_dpo_valset.py are used to develop the datasets used in direct preference optimization. dpo_train.py conducts the efficient training of the model (and results are saved in rl_training_logs.txt). Finally, rl_merge_hf.py takes the output of dpo_train.py, merges with the original model used for fine-tuning, and pushes the complete model to hugging face.

Output: A model fully optimized for OCaml code generation and training logs for that model.

4. `eval`

Purpose: Used to evaluate the models created in steps 2 and 3.

Summary: This file consists of functions to evaluate the model's performance.

Key Features

Data Efficiency: Reduces dataset size and improves data quality through intelligent pruning
Two-Stage Training: Combines general fine-tuning with task-specific reinforcement learning
Parameter Efficiency: Uses PEFT techniques to minimize computational requirements
Specialization: Focuses on practical OCaml code completion tasks

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
data		data
eval		eval
rl		rl
sft		sft
.gitignore		.gitignore
README.md		README.md
few_shot_data.py		few_shot_data.py
few_shot_prompts.py		few_shot_prompts.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCaml Code LLM

Overview

Architecture

Component Details

1. `data`

2. `sft`

3. `rl`

4. `eval`

Key Features

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OCaml Code LLM

Overview

Architecture

Component Details

1. data

2. sft

3. rl

4. eval

Key Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. `data`

2. `sft`

3. `rl`

4. `eval`

Packages