Drop float32 weight copy from BitLinear to cut resident memory ~80% per linear layer by Copilot · Pull Request #24 · sharpninja/BitNet-b1.58-Sharp

Copilot · 2026-03-21T14:10:50Z

BitLinear allocated and retained a float[,] _fullPrecisionWeights array alongside sbyte[,] _ternaryWeights. The float copy was written once in QuantizeFromFullPrecision and never read again—Forward uses only ternary weights + Gamma, and ToFullPrecision reconstructs from those same two. The dead copy consumed 4× the inference-time weight storage.

Changes

BitLinear.cs — Remove _fullPrecisionWeights field, its constructor allocation, and the Buffer.BlockCopy in QuantizeFromFullPrecision. Update EstimateResidentParameterBytes to reflect actual resident state: ternary weights + one Gamma scalar.

// Before: 5 bytes/weight (float32 + sbyte)
public long EstimateResidentParameterBytes() =>
    ((long)_fullPrecisionWeights.Length * sizeof(float)) + ((long)_ternaryWeights.Length * sizeof(sbyte));

// After: 1 byte/weight + 4 bytes Gamma
public long EstimateResidentParameterBytes() =>
    ((long)_ternaryWeights.Length * sizeof(sbyte)) + sizeof(float);

BitNetPaperAudit.cs — Make the Memory audit Status data-driven: Passed when bitNetBytes <= traditionalBytes, Failed otherwise. The Requirement text is also conditional on the ratio so the audit is never misleading regardless of model configuration. Corrects the storage description to "ternary values encoded in int8 (sbyte)" (removing the inaccurate "1-bit" wording).
BitLinearTests.cs — Add EstimateResidentParameterBytes_CountsOnlyTernaryWeightsAndGamma to pin the new formula.
BitNetPaperAuditTests.cs / PaperAlignedRuntimeSteps.cs — Scope the "zero failures" assertions to non-Memory checks, since the Memory audit check now correctly reports Failed for the default configs where the BitNet transformer (dim=256, 4 layers) is larger than the tiny traditional comparison model (embeddingDim=48).

Original prompt

memory performance is unacceptable.

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

… per linear layer Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/BitNet-b1.58-Sharp/sessions/33e5259f-b437-45f9-a9c1-2f09803debae

chatgpt-codex-connector · 2026-03-21T15:46:06Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Copilot

Pull request overview

Reduces BitNet’s inference-time resident memory by removing the unused float32 weight copy from BitLinear, updating memory accounting, and adjusting the paper audit narrative to match the new storage model.

Changes:

Remove _fullPrecisionWeights storage/copy from BitLinear and update EstimateResidentParameterBytes.
Update BitNetPaperAudit memory audit narrative to reflect current resident storage.
Add a unit test to pin BitLinear.EstimateResidentParameterBytes to “ternary weights + gamma”.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
tests/BitNetSharp.Tests/BitLinearTests.cs	Adds a test to validate the updated resident-parameter byte estimate.
src/BitNetSharp.Core/Layers/BitLinear.cs	Drops the unused float32 weight array/copy and updates resident memory estimation.
src/BitNetSharp.Core/BitNetPaperAudit.cs	Updates the memory audit text to align with the new `BitLinear` storage model.

src/BitNetSharp.Core/BitNetPaperAudit.cs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/BitNet-b1.58-Sharp/sessions/525ba1fe-aa3f-422b-b932-714260639d64

Initial plan

2066dfa

Copilot AI assigned Copilot and sharpninja Mar 21, 2026

Copilot started work on behalf of sharpninja March 21, 2026 14:10 View session

Drop _fullPrecisionWeights from BitLinear to cut resident memory ~80%…

e4a2282

… per linear layer Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/BitNet-b1.58-Sharp/sessions/33e5259f-b437-45f9-a9c1-2f09803debae

Copilot AI changed the title ~~[WIP] Fix unacceptable memory performance issues~~ Drop float32 weight copy from BitLinear to cut resident memory ~80% per linear layer Mar 21, 2026

Copilot AI requested a review from sharpninja March 21, 2026 14:44

Copilot finished work on behalf of sharpninja March 21, 2026 14:44

sharpninja approved these changes Mar 21, 2026

View reviewed changes

sharpninja marked this pull request as ready for review March 21, 2026 15:46

Copilot AI review requested due to automatic review settings March 21, 2026 15:46

Copilot started reviewing on behalf of sharpninja March 21, 2026 15:46 View session

Copilot AI reviewed Mar 21, 2026

View reviewed changes

src/BitNetSharp.Core/BitNetPaperAudit.cs Outdated Show resolved Hide resolved

src/BitNetSharp.Core/BitNetPaperAudit.cs Show resolved Hide resolved

Update src/BitNetSharp.Core/BitNetPaperAudit.cs

5746322

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot started work on behalf of sharpninja March 21, 2026 15:49 View session

Make memory audit status conditional on bitNetBytes vs traditionalBytes

2ec3028

Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/BitNet-b1.58-Sharp/sessions/525ba1fe-aa3f-422b-b932-714260639d64

Copilot AI requested a review from sharpninja March 21, 2026 15:57

Copilot finished work on behalf of sharpninja March 21, 2026 15:57

sharpninja approved these changes Mar 21, 2026

View reviewed changes

sharpninja merged commit 1d326d0 into main Mar 21, 2026
2 checks passed

sharpninja deleted the copilot/improve-memory-performance branch March 21, 2026 16:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop float32 weight copy from BitLinear to cut resident memory ~80% per linear layer#24

Drop float32 weight copy from BitLinear to cut resident memory ~80% per linear layer#24
sharpninja merged 4 commits intomainfrom
copilot/improve-memory-performance

Copilot AI commented Mar 21, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot commented Mar 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

chatgpt-codex-connector bot commented Mar 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Mar 21, 2026 •

edited

Loading