Local LLM Fine-Tuning Framework

A memory-efficient framework for fine-tuning large language models on consumer GPUs using QLoRA. Designed for NVIDIA RTX 5090 (Blackwell architecture) but compatible with RTX 30/40 series.

Features

Memory Efficient: Fine-tune 7B parameter models with only ~16GB VRAM using 4-bit quantization
QLoRA Training: Parameter-efficient fine-tuning with Low-Rank Adaptation
RTX 5090 Ready: Full support for Blackwell architecture (sm_120) via PyTorch nightly
Flexible: Works with Mistral, Llama, Qwen, and other HuggingFace models
Easy to Use: Simple CLI interface for training and inference

Quick Start

Prerequisites

Python 3.10+
NVIDIA GPU with 16GB+ VRAM
CUDA 12.8+ (for RTX 50 series) or CUDA 12.1+ (for RTX 30/40 series)

Installation

git clone https://github.com/amyanger/local-llm-project.git
cd local-llm-project

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
venv\Scripts\activate     # Windows

# Install PyTorch (RTX 5090 / Blackwell)
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128

# Install dependencies
pip install -r requirements.txt

Training

# Fine-tune OpenHermes Mistral 7B on UltraChat dataset (default)
python src/train.py

# Or customize the training
python src/train.py \
    --model teknium/OpenHermes-2.5-Mistral-7B \
    --dataset HuggingFaceH4/ultrachat_200k \
    --epochs 1 \
    --max-samples 10000

# Use your own dataset
python src/train.py \
    --model teknium/OpenHermes-2.5-Mistral-7B \
    --dataset data/raw/your_dataset.jsonl \
    --epochs 3

Inference

# Interactive chat mode
python src/inference.py --model models/openhermes-chat

# Single prompt
python src/inference.py --model models/openhermes-chat --prompt "Explain quantum computing"

Web UI

# Launch Gradio chat interface (opens at http://localhost:7860)
python src/app.py

# With custom options
python src/app.py --model models/openhermes-chat --port 7860 --share

The web UI provides:

Clean chat interface with conversation history
Adjustable generation parameters (temperature, max tokens, etc.)
Custom system prompts
Retry/regenerate functionality

Project Structure

local-llm-project/
├── src/
│   ├── train.py          # QLoRA fine-tuning script
│   ├── inference.py      # Model inference and chat
│   └── app.py            # Gradio web UI
├── data/
│   ├── raw/              # Training datasets
│   └── processed/        # Preprocessed data
├── models/
│   └── checkpoints/      # Saved model weights
├── config/               # Training configurations
├── requirements.txt      # Python dependencies
└── README.md

Supported Models

Model	Parameters	VRAM Required	Recommended For
Mistral 7B	7B	~16GB	General tasks, instruction-following
Llama 3.1 8B	8B	~18GB	General purpose, large ecosystem
CodeLlama 7B	7B	~16GB	Code generation
Qwen 2.5 Coder 7B	7B	~16GB	Code + reasoning

Dataset Format

Training data should be in JSONL format with instruction-response pairs:

{"instruction": "What is machine learning?", "response": "Machine learning is a subset of artificial intelligence..."}
{"instruction": "Write a Python function to sort a list", "response": "def sort_list(lst):\n    return sorted(lst)"}

Or use HuggingFace datasets with a text field directly.

Training Configuration

Parameter	Default	Description
`--model`	`teknium/OpenHermes-2.5-Mistral-7B`	Base model from HuggingFace
`--dataset`	`HuggingFaceH4/ultrachat_200k`	Path to JSONL or HuggingFace dataset
`--epochs`	1	Number of training epochs
`--batch-size`	2	Per-device batch size
`--lr`	2e-5	Learning rate
`--max-samples`	10000	Max training samples (0 for all)
`--output`	`models/openhermes-chat`	Output directory

Technical Details

QLoRA Configuration

LoRA Rank (r): 16
LoRA Alpha: 16
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Quantization: 4-bit NF4 with double quantization
Compute dtype: BFloat16

Stack

Transformers: Model loading and tokenization
PEFT: Parameter-efficient fine-tuning (LoRA)
BitsAndBytes: 4-bit quantization
TRL: Supervised fine-tuning trainer
Datasets: Data loading and processing

Hardware Requirements

Minimum

NVIDIA GPU with 16GB VRAM (RTX 4080, 3090, etc.)
32GB System RAM
50GB Storage

RTX 5090 Notes

The RTX 5090 uses Blackwell architecture (sm_120) which requires:

PyTorch nightly build with CUDA 12.8
Latest NVIDIA drivers (560+)

Results

Fine-tuning Mistral 7B on 10K instruction pairs:

Metric	Before	After
Training Loss	2.1	0.8
Perplexity	8.2	2.2

Results vary based on dataset quality and training duration.

Roadmap

Add DPO (Direct Preference Optimization) support
Implement evaluation benchmarks
Add multi-GPU training with DeepSpeed
Create web UI for inference
Support for vision-language models

References

License

MIT License - see LICENSE for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Built for the AI/ML community. Star this repo if you find it useful!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local LLM Fine-Tuning Framework

Features

Quick Start

Prerequisites

Installation

Training

Inference

Web UI

Project Structure

Supported Models

Dataset Format

Training Configuration

Technical Details

QLoRA Configuration

Stack

Hardware Requirements

Minimum

Recommended

RTX 5090 Notes

Results

Roadmap

References

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data/raw		data/raw
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Local LLM Fine-Tuning Framework

Features

Quick Start

Prerequisites

Installation

Training

Inference

Web UI

Project Structure

Supported Models

Dataset Format

Training Configuration

Technical Details

QLoRA Configuration

Stack

Hardware Requirements

Minimum

Recommended

RTX 5090 Notes

Results

Roadmap

References

License

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages