Drop float32 weight copy from BitLinear to cut resident memory ~80% per linear layer#24
Merged
sharpninja merged 4 commits intomainfrom Mar 21, 2026
Merged
Conversation
… per linear layer Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/BitNet-b1.58-Sharp/sessions/33e5259f-b437-45f9-a9c1-2f09803debae
Copilot
AI
changed the title
[WIP] Fix unacceptable memory performance issues
Drop float32 weight copy from BitLinear to cut resident memory ~80% per linear layer
Mar 21, 2026
sharpninja
approved these changes
Mar 21, 2026
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Contributor
There was a problem hiding this comment.
Pull request overview
Reduces BitNet’s inference-time resident memory by removing the unused float32 weight copy from BitLinear, updating memory accounting, and adjusting the paper audit narrative to match the new storage model.
Changes:
- Remove
_fullPrecisionWeightsstorage/copy fromBitLinearand updateEstimateResidentParameterBytes. - Update
BitNetPaperAuditmemory audit narrative to reflect current resident storage. - Add a unit test to pin
BitLinear.EstimateResidentParameterBytesto “ternary weights + gamma”.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tests/BitNetSharp.Tests/BitLinearTests.cs | Adds a test to validate the updated resident-parameter byte estimate. |
| src/BitNetSharp.Core/Layers/BitLinear.cs | Drops the unused float32 weight array/copy and updates resident memory estimation. |
| src/BitNetSharp.Core/BitNetPaperAudit.cs | Updates the memory audit text to align with the new BitLinear storage model. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com> Agent-Logs-Url: https://github.com/sharpninja/BitNet-b1.58-Sharp/sessions/525ba1fe-aa3f-422b-b932-714260639d64
sharpninja
approved these changes
Mar 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
BitLinearallocated and retained afloat[,] _fullPrecisionWeightsarray alongsidesbyte[,] _ternaryWeights. The float copy was written once inQuantizeFromFullPrecisionand never read again—Forwarduses only ternary weights + Gamma, andToFullPrecisionreconstructs from those same two. The dead copy consumed 4× the inference-time weight storage.Changes
BitLinear.cs— Remove_fullPrecisionWeightsfield, its constructor allocation, and theBuffer.BlockCopyinQuantizeFromFullPrecision. UpdateEstimateResidentParameterBytesto reflect actual resident state: ternary weights + one Gamma scalar.BitNetPaperAudit.cs— Make the Memory auditStatusdata-driven:PassedwhenbitNetBytes <= traditionalBytes,Failedotherwise. TheRequirementtext is also conditional on the ratio so the audit is never misleading regardless of model configuration. Corrects the storage description to "ternary values encoded in int8 (sbyte)" (removing the inaccurate "1-bit" wording).BitLinearTests.cs— AddEstimateResidentParameterBytes_CountsOnlyTernaryWeightsAndGammato pin the new formula.BitNetPaperAuditTests.cs/PaperAlignedRuntimeSteps.cs— Scope the "zero failures" assertions to non-Memory checks, since the Memory audit check now correctly reportsFailedfor the default configs where the BitNet transformer (dim=256, 4 layers) is larger than the tiny traditional comparison model (embeddingDim=48).Original prompt
🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.