fix: cast eagle_acts to draft dtype before send to avoid bf16/fp16 bit reinterpretation#18
Open
WLLEGit wants to merge 1 commit into
Open
fix: cast eagle_acts to draft dtype before send to avoid bf16/fp16 bit reinterpretation#18WLLEGit wants to merge 1 commit into
WLLEGit wants to merge 1 commit into
Conversation
…t reinterpretation in async prefill
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix dtype mismatch in async draft prefill:
eagle_actsis sent from the target (bf16) without casting, then received into the draft's buffer (fp16).dist.recvjust copies bytes, corrupting the conditioning hidden states fed into the draft's prefill. This poisons every entry of the draft KV cache and degrades acceptance for the entire generation.The speculate path at
speculator_async.py:175already casts viarecovery_activations.to(self.draft_dtype). The prefill path atspeculator_async.py:89was missing the same cast.Fix
Diagnosis
Reproduced on Llama-3.1-8B (bf16) + yuhuili EAGLE3 (fp16), humaneval, K=5, B=8 async + JIT backup.
Layer trace at draft prefill last position (213), reading conditioning into
fc:tok_embnormcond(=fc(eagle_acts[212])) normattn_outnorm at first JIT stepThe conditioning passed into
fcmatched between SSD and the oracle (verified via the dumpedtarget_recovery_activations), but the conditioning received by the draft for prefill positions was numerically off by ~15×. Independently computingfc(eagle_acts[212])in fp32 confirms the oracle's value (31.38) is correct.Impact
humaneval, 8 prompts × 256 output tokens, K=5:
JIT-path pos0 accept now matches vLLM almost exactly (0.761 vs 0.764), confirming the per-step computation is correct once the conditioning is clean.