Skip to content

choosemoon/opc-codex

Repository files navigation

OPC-Codex: Distilling Verified OPC Judgment into a Fine-Tuned LLM

A complete, reproducible LoRA fine-tuning pipeline that distills the decision-making patterns of verified One-Person Company (OPC) practitioners into a deployable AI advisor. Built with Qwen3-14B on Apple Silicon using MLX.

License: MIT Python 3.13 Model: Qwen3-14B Framework: MLX Deploy: Ollama


Table of Contents


Motivation

Solo entrepreneurs (One-Person Companies, or OPCs) face a unique set of challenges that general-purpose LLMs are poorly equipped to handle. Generic AI assistants tend to be agreeable, generic, and lacking in the specific judgment frameworks that experienced OPC practitioners rely on daily.

OPC-Codex addresses this gap by fine-tuning a large language model on high-quality conversational data from verified OPC practitioners. The goal is not to replace human judgment, but to create an AI advisor that internalizes the thinking patterns, frameworks, and decision-making heuristics of those who have already succeeded in the OPC space.

Why Fine-Tuning Instead of RAG or Prompt Engineering?

Approach Strengths Weaknesses
RAG (Retrieval-Augmented Generation) Accurate factual recall Style inconsistency; retrieval failures expose base model
Prompt Engineering / Skills Easy to iterate Template-like responses; limited depth
Fine-Tuning (this project) Consistent style; transferable reasoning Higher deployment cost; requires quality data

Each approach has its place. Fine-tuning excels when you need consistent persona, transferable reasoning, and style that doesn't degrade over long conversations. RAG and Skills excel when you need factual accuracy and easy iteration. The ideal production system combines all three.


Architecture

┌─────────────┐     ┌──────────────┐     ┌─────────────┐     ┌──────────────┐
│  Raw Data   │────▶│  Data Eng    │────▶│  LoRA Train │────▶│  Merge &     │
│  Collection  │     │  & QC        │     │  (MLX)      │     │  Dequantize  │
└─────────────┘     └──────────────┘     └─────────────┘     └──────┬───────┘
                                                                  │
                                                                  ▼
┌─────────────┐     ┌──────────────┐     ┌─────────────┐     ┌──────────────┐
│   Ollama    │◀────│  Quantize    │◀────│  Convert    │◀────│  GGUF F16    │
│   Deploy    │     │  (Q4_K_M)    │     │  (llama.cpp)│     │  Export      │
└─────────────┘     └──────────────┘     └──────────────┘     └──────────────┘

Tech Stack

Component Technology Version
Base Model Qwen3-14B 14.8B params
Quantized Base mlx-community/Qwen3-14B-4bit 4-bit MLX
Training Framework MLX 0.31.3
Fine-Tuning Method LoRA (rank=32, alpha=64) 8 trainable layers
GGUF Conversion llama.cpp latest
Quantization Q4_K_M ~8.5GB final model
Deployment Ollama latest
Hardware Apple M4 Max (36GB) macOS

Quick Start

Prerequisites

  • macOS with Apple Silicon (M1/M2/M3/M4)
  • 32GB+ unified memory
  • Python 3.13+
  • Ollama installed

1. Clone & Setup

git clone https://github.com/YOUR_USERNAME/opc-codex.git
cd opc-codex
pip install mlx-lm --break-system-packages

2. Train (Optional — pre-trained weights available)

chmod +x scripts/retrain_v2.sh
./scripts/retrain_v2.sh

Training takes ~1-2 hours on M4 Max. Peak memory usage: ~10.6GB.

3. Convert to GGUF & Deploy

chmod +x scripts/convert_v2_to_gguf.sh
./scripts/convert_v2_to_gguf.sh

4. Run

ollama run opc-codex

Dataset

Overview

Metric Value
Total samples 221
Format JSONL (OpenAI chat format)
Language Chinese
Quality score 88+/100 (5-layer review)
Train/Val split 209 / 12 (95%/5%)

Data Collection Pipeline

  1. Source identification: Verified OPC practitioners with trackable results
  2. Content extraction: Video transcripts, articles, podcasts, social media posts
  3. Conversation formatting: Structured as multi-turn dialogues (system + user + assistant)
  4. Quality scoring: 5-dimension rubric (relevance, specificity, framework usage, uniqueness, actionability)
  5. Iterative refinement: V1 → V2 → V3 → V4, progressively improving quality

Data Format

{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}

Sample Data

See data/sample_data.jsonl for 5 anonymized examples.


Training

Hyperparameters

Parameter V1 (Failed) V2 (Current) Reason for Change
Base model Qwen3-14B-8bit Qwen3-14B-4bit Avoid quantization loss
LoRA rank 8 32 Too small for 14B model
LoRA alpha 16 64 2:1 ratio with rank
Trainable layers 16 8 OOM on 36GB Mac
Batch size 4 1 Memory constraint
Grad accumulation 4 4 Effective batch = 4
Learning rate 5e-5 2e-5 Lower for higher rank
Max seq length 2048 2048 Sufficient for most responses
Gradient checkpoint No Yes Critical for memory
Total iterations 220 1000 ~3 epochs over 209 samples

Training Configuration

# configs/lora_config_v2.yaml
lora_parameters:
  rank: 32
  alpha: 64
  dropout: 0.05
  scale: 20.0
batch_size: 2
grad_accumulation_steps: 2
learning_rate: 2.0e-5
optimizer: "adam"

Training Results

Iter    Train Loss    Val Loss    Tokens/sec    Peak Mem
  1      3.244         3.507      198           10.2 GB
100      1.803         2.151      160           10.6 GB
200      1.770         2.151      153           10.6 GB
300      1.765         2.119      165           10.6 GB
400      1.695         2.119      165           10.6 GB  ← Best val loss
500      1.203         2.209      165           10.6 GB
600      1.024         2.209      162           10.6 GB
700      0.925         2.543      162           10.6 GB
800      0.782         2.543      155           10.6 GB
900      0.395         2.773      164           10.6 GB
1000     0.371         2.773      166           10.6 GB  ← Final

Key observation: Training loss decreased consistently (3.5 → 0.37), but validation loss started increasing after iter 400, indicating overfitting. The optimal checkpoint is around iter 400-500.


Results & Evaluation

Evaluation Framework

We designed a 9-question progressive test to evaluate persona fidelity:

Level Question Type What It Tests
1 Style recognition Does the model adopt the right tone?
2 Methodology activation Does it use specific frameworks?
3 Deep reasoning transfer Can it think in novel ways?
4 Stress test Does it maintain persona under pressure?

Results Summary

Version Score Style Content Recall Repetition
Base (Qwen3-14B) Generic N/A None
V1 (rank=8) D Surface imitation 0/9 Severe
V2 (rank=32, final) C+ Improved 1/9 Moderate
Skills/Prompt baseline B+ Good 5/9 None

Honest Assessment

The fine-tuned model successfully captures the tone and attitude (direct, contrarian, confident) but struggles with content recall (specific case studies, named frameworks, exact methodologies). This is expected given:

  • Only 0.17% of parameters are trainable (LoRA rank=32 on 14B model)
  • 221 samples is insufficient for both style and content learning
  • Overfitting after iter 400 suggests the model memorizes surface patterns rather than deep understanding

For production use, we recommend a hybrid approach: fine-tuning for style + RAG for content accuracy + Skills for framework enforcement.


Failure Analysis & Lessons Learned

This is the most valuable section of this project. Every failure is documented to save you time.

V1: Five Root Causes of Failure

# Issue Symptom Fix
1 LoRA rank too low (8) Model couldn't learn style or content Increased to 32
2 Insufficient epochs (~1) Most training data never seen Increased to ~3
3 8-bit quantized base model Precision loss compounded during GGUF conversion Switched to 4-bit MLX base
4 Wrong mlx-lm CLI syntax Training failed to start Updated to 0.31.x API
5 No gradient checkpointing OOM on 36GB Mac Added --grad-checkpoint

V2: Three Remaining Issues

# Issue Symptom Potential Fix
1 Overfitting Val loss increased after iter 400 Use early stopping; best checkpoint at iter 400
2 Content recall gap Specific cases/frameworks not reproduced Increase data to 500+ samples
3 Thinking mode leak Qwen3 generates 💭 blocks Add stop tokens in Modelfile

Lessons Learned

  1. Always check mlx-lm version compatibility — CLI arguments change between minor versions
  2. Start with gradient checkpointing enabled — it's free insurance against OOM
  3. Monitor validation loss, not just training loss — divergence means overfitting
  4. Save checkpoints frequently — the best model may not be the final one
  5. GGUF conversion requires dequantization first — MLX quantized weights are not directly compatible with llama.cpp

Methodology: A Reproducible Framework

Based on our experience, here is a 5-step framework for fine-tuning a persona-specific LLM from small data:

Step 1: Define the Persona

  • Identify 3-5 core traits (e.g., "direct", "framework-driven", "contrarian")
  • List 5-10 signature frameworks/methodologies
  • Collect 10+ representative examples of desired output

Step 2: Collect & Curate Data

  • Minimum 200 high-quality samples (500+ recommended)
  • Use a structured quality rubric (5 dimensions)
  • Format as multi-turn conversations
  • Split 95/5 for train/validation

Step 3: Train with Conservative Parameters

  • Use 4-bit quantized base to fit on consumer hardware
  • LoRA rank = 32-64 (higher for smaller base models)
  • Learning rate = 1e-5 to 3e-5
  • Enable gradient checkpointing
  • Monitor validation loss for early stopping

Step 4: Evaluate with Progressive Tests

  • Design 9+ questions across 4 difficulty levels
  • Compare against base model and prompt-only baseline
  • Score on style, content recall, and repetition

Step 5: Deploy & Iterate

  • Convert to GGUF via dequantization → F16 → quantization
  • Deploy with Ollama for easy testing
  • Collect user feedback for next training iteration

Deployment

Option 1: Ollama (Recommended)

# After running convert_v2_to_gguf.sh
ollama create opc-codex -f Modelfile.opc_codex_v2
ollama run opc-codex

Option 2: llama.cpp Server

./llama.cpp/build/bin/llama-server \
    -m opc_codex_v2_14b_q4_k_m.gguf \
    -c 4096 \
    --temp 0.7 \
    --top-p 0.8

Option 3: LM Studio / GPT4All

Import the .gguf file directly into any GGUF-compatible client.


Project Structure

opc-codex/
├── README.md                    # This file (English)
├── README_zh.md                 # Chinese documentation
├── LICENSE                      # MIT License
├── .gitignore                   # Git ignore rules
│
├── configs/
│   ├── lora_config_v2.yaml      # LoRA training configuration
│   └── training_params.md       # Hyperparameter documentation
│
├── data/
│   ├── README.md                # Data documentation
│   └── sample_data.jsonl        # 5 anonymized examples
│
├── scripts/
│   ├── retrain_v2.sh            # One-click training script
│   ├── convert_v2_to_gguf.sh    # MLX → GGUF conversion
│   ├── convert_to_gguf.sh       # V1 conversion (legacy)
│   ├── run_finetune.sh          # V1 training (legacy)
│   └── run_finetune_4b.sh       # Mobile variant (4B model)
│
├── docs/
│   ├── methodology.md           # 5-step fine-tuning framework
│   ├── failure_analysis.md      # Detailed failure analysis
│   └── evaluation.md            # Evaluation framework & results
│
├── Modelfile.opc_codex_v2      # Ollama deployment configuration
│
└── .github/
    └── ISSUE_TEMPLATE/
        └── bug_report.md

Roadmap

  • Data expansion: Increase to 500+ high-quality samples
  • Full fine-tuning: Experiment with DoRA or full-parameter tuning on cloud GPU
  • Hybrid architecture: Combine fine-tuning (style) + RAG (content) + Skills (frameworks)
  • Mobile variant: Optimize Qwen3-4B version for on-device deployment
  • Evaluation benchmark: Build automated persona fidelity scoring
  • Multi-persona support: Fine-tune multiple OPC practitioners as switchable personas
  • Hugging Face upload: Publish model with proper Model Card

License

This project is licensed under the MIT License. See LICENSE for details.


Acknowledgments

  • Qwen Team for the Qwen3 base model
  • MLX for the Apple Silicon training framework
  • llama.cpp for GGUF conversion tools
  • Ollama for easy local deployment

About

Distilling verified OPC judgment into a fine-tuned LLM. LoRA fine-tuning pipeline with Qwen3-14B on Apple Silicon using MLX.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages