Reduce peak GPU memory in Eagle3 online target generation by avoiding an extra logits copy by zijiexia · Pull Request #528 · sgl-project/SpecForge

zijiexia · 2026-04-09T04:25:29Z

Motivation

This PR fixes an out-of-memory issue in Eagle3 online training caused by an unnecessary full-tensor copy when shifting target logits.

Previously, generate_eagle3_data() accumulated per-sample logits into a list, concatenated them into a [B, T, V] tensor, and then called padding(target_out, left=False) to shift the logits left and append a zero row at the end. For large vocab models, that final padding step materialized another full [B, T, V] allocation and could trigger multi-GB peak memory spikes.

This change pre-allocates the final target_out tensor once and writes the shifted logits directly into it:

target_out[idx, :-1] = logits[..., 1:, :]
target_out[idx, -1] = 0

That preserves the original semantics while removing the extra full-size allocation.

Root Cause
The old implementation created peak memory pressure in two stages:

Concatenate per-sample logits into a full target_out tensor.
Call padding(target_out, left=False), which internally builds a zero padding tensor and concatenates again, creating another full-sized [B, T, V] tensor.

For Eagle3 online training, V is the target model vocabulary size, so this copy is extremely expensive. In practice this showed up as OOM during generate_eagle3_data() even though steady-state memory usage was otherwise close to fitting.

Modifications

Stop collecting target_out as a Python list of per-sample logits tensors.
Detect whether logits are present with has_logits = logits_list[0] is not None.
Pre-allocate target_out with shape [B, T, V] using the first logits tensor's device and dtype.
Write the shifted logits directly into the pre-allocated output tensor during the main loop.
Remove the padding(target_out, left=False) call entirely.

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://sgl-fru7574.slack.com/archives/C09784E3EN6 to discuss your PR.

…ch processing

gemini-code-assist · 2026-04-09T04:25:33Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

zijiexia added 2 commits April 8, 2026 21:21

Optimize target_out allocation to reduce peak memory usage during bat…

485c2a2

…ch processing

fix lint

2d347ac

zijiexia requested review from FlamingoPg, FrankLeeeee, shuaills and sleepcoo as code owners April 9, 2026 04:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce peak GPU memory in Eagle3 online target generation by avoiding an extra logits copy#528

Reduce peak GPU memory in Eagle3 online target generation by avoiding an extra logits copy#528
zijiexia wants to merge 2 commits intosgl-project:mainfrom
zijiexia:fix_target_out_oom

zijiexia commented Apr 9, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zijiexia commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zijiexia commented Apr 9, 2026 •

edited

Loading