Skip to content

fix(finetune): patch missing LayerNorm tensors after merge_and_unload#540

Open
elasticdotventures wants to merge 3 commits into
mainfrom
task/8-gguf-tensor-fix
Open

fix(finetune): patch missing LayerNorm tensors after merge_and_unload#540
elasticdotventures wants to merge 3 commits into
mainfrom
task/8-gguf-tensor-fix

Conversation

@elasticdotventures

Copy link
Copy Markdown
Owner

Bug

Stock llama.cpp server fails to load fine-tuned Qwen3.5 GGUF:

error loading model: missing tensor 'blk.24.attn_norm.weight'

Root Cause

Qwen3.5's attn_norm.weight tensors are stored as bnb.nn.Linear4bit in the 4-bit base model. merge_and_unload() preserves LoRA target modules (q_proj, k_proj, etc.) but the base-model LayerNorm params lose their state_dict keys during the merge.

Fix

After merge_and_unload(), scan for missing LayerNorm weights in the state dict. For each missing tensor, load the FP16 base model and copy the weight directly. Verified on blk.24.attn_norm.weight.

Also: replaced unsloth.save_gguf with llama.cpp convert_hf_to_gguf + llama-quantize pipeline — compatible with ghcr.io/ggml-org/llama.cpp:server stock container.

…→ GGUF

- scripts/generate-b00t-training-data.py: extracts 4474 instruction pairs
  from AGENTS.md, Rust doc comments, datum tomls, skills, justfile
- scripts/finetune-b00t.py: unsloth QLoRA (4-bit + LoRA r=16) on
  Qwen3.5-4B, ChatML format, SFTTrainer, merge + GGUF export
- _b00t_/b00t-finetune.cli.toml: deterministic command datum —
  generate → install deps → finetune → serve, with HF lookup cmds
Training completed: 200 steps, loss 3.26→2.09, 6.4M trainable params.
LoRA saved: ~/.b00t/training/output/b00t-lora/
FP16 merged: ~/.b00t/training/output/b00t-merged-fp16/
GGUF: ~/.b00t/training/output/b00t-finetuned.Q4_K_M.gguf (504MB)

⚠️ GGUF conversion dependency: unsloth's fork produces GGUF format
incompatible with stock ghcr.io/ggml-org/llama.cpp:server. Fix: build
llama-quantize from llama.cpp master, or use unsloth's bundled server.
Root cause: Qwen3.5's attn_norm.weight tensors are stored as
bnb.nn.Linear4bit in the 4-bit base model. merge_and_unload()
preserves LoRA target modules (q_proj, k_proj, etc.) but the
base-model LayerNorm params lose their state_dict keys.

Fix: after merge, scan for missing LayerNorm weights in state_dict.
Patch from a freshly loaded FP16 base model. Verified on
blk.24.attn_norm.weight — the tensor that blocked stock server loading.

Also: replaced unsloth save_gguf (format mismatch) with llama.cpp
convert_hf_to_gguf + llama-quantize pipeline (compatible with
ghcr.io/ggml-org/llama.cpp:server).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant