fix(finetune): patch missing LayerNorm tensors after merge_and_unload#540
Open
elasticdotventures wants to merge 3 commits into
Open
fix(finetune): patch missing LayerNorm tensors after merge_and_unload#540elasticdotventures wants to merge 3 commits into
elasticdotventures wants to merge 3 commits into
Conversation
…→ GGUF - scripts/generate-b00t-training-data.py: extracts 4474 instruction pairs from AGENTS.md, Rust doc comments, datum tomls, skills, justfile - scripts/finetune-b00t.py: unsloth QLoRA (4-bit + LoRA r=16) on Qwen3.5-4B, ChatML format, SFTTrainer, merge + GGUF export - _b00t_/b00t-finetune.cli.toml: deterministic command datum — generate → install deps → finetune → serve, with HF lookup cmds
Training completed: 200 steps, loss 3.26→2.09, 6.4M trainable params. LoRA saved: ~/.b00t/training/output/b00t-lora/ FP16 merged: ~/.b00t/training/output/b00t-merged-fp16/ GGUF: ~/.b00t/training/output/b00t-finetuned.Q4_K_M.gguf (504MB)⚠️ GGUF conversion dependency: unsloth's fork produces GGUF format incompatible with stock ghcr.io/ggml-org/llama.cpp:server. Fix: build llama-quantize from llama.cpp master, or use unsloth's bundled server.
Root cause: Qwen3.5's attn_norm.weight tensors are stored as bnb.nn.Linear4bit in the 4-bit base model. merge_and_unload() preserves LoRA target modules (q_proj, k_proj, etc.) but the base-model LayerNorm params lose their state_dict keys. Fix: after merge, scan for missing LayerNorm weights in state_dict. Patch from a freshly loaded FP16 base model. Verified on blk.24.attn_norm.weight — the tensor that blocked stock server loading. Also: replaced unsloth save_gguf (format mismatch) with llama.cpp convert_hf_to_gguf + llama-quantize pipeline (compatible with ghcr.io/ggml-org/llama.cpp:server).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug
Stock llama.cpp server fails to load fine-tuned Qwen3.5 GGUF:
Root Cause
Qwen3.5's
attn_norm.weighttensors are stored asbnb.nn.Linear4bitin the 4-bit base model.merge_and_unload()preserves LoRA target modules (q_proj, k_proj, etc.) but the base-model LayerNorm params lose theirstate_dictkeys during the merge.Fix
After
merge_and_unload(), scan for missing LayerNorm weights in the state dict. For each missing tensor, load the FP16 base model and copy the weight directly. Verified onblk.24.attn_norm.weight.Also: replaced
unsloth.save_ggufwith llama.cppconvert_hf_to_gguf + llama-quantizepipeline — compatible withghcr.io/ggml-org/llama.cpp:serverstock container.