Skip to content

Drop float32 weight copy from BitLinear to cut resident memory ~80% per linear layer#24

Merged
sharpninja merged 4 commits intomainfrom
copilot/improve-memory-performance
Mar 21, 2026
Merged

Drop float32 weight copy from BitLinear to cut resident memory ~80% per linear layer#24
sharpninja merged 4 commits intomainfrom
copilot/improve-memory-performance

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 21, 2026

BitLinear allocated and retained a float[,] _fullPrecisionWeights array alongside sbyte[,] _ternaryWeights. The float copy was written once in QuantizeFromFullPrecision and never read again—Forward uses only ternary weights + Gamma, and ToFullPrecision reconstructs from those same two. The dead copy consumed 4× the inference-time weight storage.

Changes

  • BitLinear.cs — Remove _fullPrecisionWeights field, its constructor allocation, and the Buffer.BlockCopy in QuantizeFromFullPrecision. Update EstimateResidentParameterBytes to reflect actual resident state: ternary weights + one Gamma scalar.

    // Before: 5 bytes/weight (float32 + sbyte)
    public long EstimateResidentParameterBytes() =>
        ((long)_fullPrecisionWeights.Length * sizeof(float)) + ((long)_ternaryWeights.Length * sizeof(sbyte));
    
    // After: 1 byte/weight + 4 bytes Gamma
    public long EstimateResidentParameterBytes() =>
        ((long)_ternaryWeights.Length * sizeof(sbyte)) + sizeof(float);
  • BitNetPaperAudit.cs — Make the Memory audit Status data-driven: Passed when bitNetBytes <= traditionalBytes, Failed otherwise. The Requirement text is also conditional on the ratio so the audit is never misleading regardless of model configuration. Corrects the storage description to "ternary values encoded in int8 (sbyte)" (removing the inaccurate "1-bit" wording).

  • BitLinearTests.cs — Add EstimateResidentParameterBytes_CountsOnlyTernaryWeightsAndGamma to pin the new formula.

  • BitNetPaperAuditTests.cs / PaperAlignedRuntimeSteps.cs — Scope the "zero failures" assertions to non-Memory checks, since the Memory audit check now correctly reports Failed for the default configs where the BitNet transformer (dim=256, 4 layers) is larger than the tiny traditional comparison model (embeddingDim=48).

Original prompt

memory performance is unacceptable.


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

… per linear layer

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com>
Agent-Logs-Url: https://github.com/sharpninja/BitNet-b1.58-Sharp/sessions/33e5259f-b437-45f9-a9c1-2f09803debae
Copilot AI changed the title [WIP] Fix unacceptable memory performance issues Drop float32 weight copy from BitLinear to cut resident memory ~80% per linear layer Mar 21, 2026
Copilot AI requested a review from sharpninja March 21, 2026 14:44
@sharpninja sharpninja marked this pull request as ready for review March 21, 2026 15:46
Copilot AI review requested due to automatic review settings March 21, 2026 15:46
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reduces BitNet’s inference-time resident memory by removing the unused float32 weight copy from BitLinear, updating memory accounting, and adjusting the paper audit narrative to match the new storage model.

Changes:

  • Remove _fullPrecisionWeights storage/copy from BitLinear and update EstimateResidentParameterBytes.
  • Update BitNetPaperAudit memory audit narrative to reflect current resident storage.
  • Add a unit test to pin BitLinear.EstimateResidentParameterBytes to “ternary weights + gamma”.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
tests/BitNetSharp.Tests/BitLinearTests.cs Adds a test to validate the updated resident-parameter byte estimate.
src/BitNetSharp.Core/Layers/BitLinear.cs Drops the unused float32 weight array/copy and updates resident memory estimation.
src/BitNetSharp.Core/BitNetPaperAudit.cs Updates the memory audit text to align with the new BitLinear storage model.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI requested a review from sharpninja March 21, 2026 15:57
@sharpninja sharpninja merged commit 1d326d0 into main Mar 21, 2026
2 checks passed
@sharpninja sharpninja deleted the copilot/improve-memory-performance branch March 21, 2026 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants