fix(finetune): patch missing LayerNorm tensors after merge_and_unload by elasticdotventures · Pull Request #540 · elasticdotventures/_b00t_

elasticdotventures · 2026-06-26T12:11:55Z

Bug

Stock llama.cpp server fails to load fine-tuned Qwen3.5 GGUF:

error loading model: missing tensor 'blk.24.attn_norm.weight'

Root Cause

Qwen3.5's attn_norm.weight tensors are stored as bnb.nn.Linear4bit in the 4-bit base model. merge_and_unload() preserves LoRA target modules (q_proj, k_proj, etc.) but the base-model LayerNorm params lose their state_dict keys during the merge.

Fix

After merge_and_unload(), scan for missing LayerNorm weights in the state dict. For each missing tensor, load the FP16 base model and copy the weight directly. Verified on blk.24.attn_norm.weight.

Also: replaced unsloth.save_gguf with llama.cpp convert_hf_to_gguf + llama-quantize pipeline — compatible with ghcr.io/ggml-org/llama.cpp:server stock container.

…→ GGUF - scripts/generate-b00t-training-data.py: extracts 4474 instruction pairs from AGENTS.md, Rust doc comments, datum tomls, skills, justfile - scripts/finetune-b00t.py: unsloth QLoRA (4-bit + LoRA r=16) on Qwen3.5-4B, ChatML format, SFTTrainer, merge + GGUF export - _b00t_/b00t-finetune.cli.toml: deterministic command datum — generate → install deps → finetune → serve, with HF lookup cmds

Training completed: 200 steps, loss 3.26→2.09, 6.4M trainable params. LoRA saved: ~/.b00t/training/output/b00t-lora/ FP16 merged: ~/.b00t/training/output/b00t-merged-fp16/ GGUF: ~/.b00t/training/output/b00t-finetuned.Q4_K_M.gguf (504MB) ⚠️ GGUF conversion dependency: unsloth's fork produces GGUF format incompatible with stock ghcr.io/ggml-org/llama.cpp:server. Fix: build llama-quantize from llama.cpp master, or use unsloth's bundled server.

Root cause: Qwen3.5's attn_norm.weight tensors are stored as bnb.nn.Linear4bit in the 4-bit base model. merge_and_unload() preserves LoRA target modules (q_proj, k_proj, etc.) but the base-model LayerNorm params lose their state_dict keys. Fix: after merge, scan for missing LayerNorm weights in state_dict. Patch from a freshly loaded FP16 base model. Verified on blk.24.attn_norm.weight — the tensor that blocked stock server loading. Also: replaced unsloth save_gguf (format mismatch) with llama.cpp convert_hf_to_gguf + llama-quantize pipeline (compatible with ghcr.io/ggml-org/llama.cpp:server).

elasticdotventures added 3 commits June 26, 2026 22:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(finetune): patch missing LayerNorm tensors after merge_and_unload#540

fix(finetune): patch missing LayerNorm tensors after merge_and_unload#540
elasticdotventures wants to merge 3 commits into
mainfrom
task/8-gguf-tensor-fix

elasticdotventures commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

elasticdotventures commented Jun 26, 2026

Bug

Root Cause

Fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant