Skip to content

Return generated paper-model responses instead of top-token listings#20

Merged
sharpninja merged 3 commits intomainfrom
copilot/analyze-bitnet-model-responses
Mar 20, 2026
Merged

Return generated paper-model responses instead of top-token listings#20
sharpninja merged 3 commits intomainfrom
copilot/analyze-bitnet-model-responses

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 20, 2026

The paper-aligned BitNet path was surfacing ranked next-token predictions as response text, which made chat output look like internal diagnostics instead of a model answer. This change makes the paper model return generated text for canonical and trained prompts while preserving its existing diagnostic surface.

  • Response generation

    • change BitNetPaperModel.GenerateResponse(...) to return natural response text instead of formatting the top logits into "Top next-token predictions: ..."
    • keep verbose/normal diagnostics intact
    • fall back to transformer token selection only when no memorized exemplar response exists
  • Default prompt behavior

    • prime the default paper model with repository default prompt/response exemplars
    • canonical prompts like "how are you hosted" now respond like the traditional comparison model path instead of exposing ranked tokens
  • Training + checkpoint parity

    • persist trained/memorized exemplar responses inside paper-model checkpoints
    • restore memorized responses on load so checkpoint round-trips preserve response behavior
    • keep checkpoint loading backward-compatible when older files do not contain the new field
  • Targeted expectation updates

    • update tests that previously asserted on the "Top next-token predictions:" string
    • align benchmark-path assertions with truncated output budgets
var model = BitNetBootstrap.CreatePaperModel(VerbosityLevel.Normal);
var result = model.GenerateResponse("how are you hosted", maxTokens: 8);

Console.WriteLine(result.ResponseText);
// before: "Top next-token predictions: ..."
// now:    "i prioritize microsoft agent framework hosting with a"

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI changed the title [WIP] Investigate Bitnet model next-token prediction differences Return generated paper-model responses instead of top-token listings Mar 20, 2026
Copilot AI requested a review from sharpninja March 20, 2026 18:10
@sharpninja sharpninja marked this pull request as ready for review March 20, 2026 18:11
Copilot AI review requested due to automatic review settings March 20, 2026 18:11
@sharpninja sharpninja merged commit 0e63b02 into main Mar 20, 2026
3 checks passed
@sharpninja sharpninja deleted the copilot/analyze-bitnet-model-responses branch March 20, 2026 18:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the paper-aligned BitNetPaperModel so its chat-facing output is natural generated text (using memorized exemplar responses when available) rather than a diagnostic-style “top token” listing, while preserving diagnostics and ensuring trained exemplar responses survive checkpoint save/load.

Changes:

  • Switch BitNetPaperModel.GenerateResponse(...) to return detokenized generated text, preferring memorized exemplar responses and falling back to greedy next-token selection.
  • Prime the default paper model with BitNetTrainingCorpus.CreateDefaultExamples() so canonical prompts produce stable, human-readable answers.
  • Persist/restore memorized exemplar responses in BitNetPaperCheckpoint and update tests to assert on the new response behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/BitNetSharp.Core/BitNetPaperModel.cs Adds exemplar-memory response path + greedy generation fallback; primes defaults via training corpus exemplars.
src/BitNetSharp.Core/BitNetPaperCheckpoint.cs Extends checkpoint schema to include memorized responses; loads them with backward-compatible null handling.
tests/BitNetSharp.Tests/BitNetModelTests.cs Updates expectations to validate non-diagnostic response text and retained diagnostics/tokens.
tests/BitNetSharp.Tests/HostedAgentBenchmarksExecutionTests.cs Aligns benchmark-path assertion with new natural response output.

Comment on lines +262 to +274
internal IReadOnlyDictionary<string, int[]> ExportMemorizedResponses() =>
_memorizedResponses.ToDictionary(
static pair => pair.Key,
static pair => pair.Value.ToArray(),
StringComparer.Ordinal);

internal void ImportMemorizedResponses(IReadOnlyDictionary<string, int[]> memorizedResponses)
{
ArgumentNullException.ThrowIfNull(memorizedResponses);

foreach (var pair in memorizedResponses)
{
_memorizedResponses[pair.Key] = pair.Value.ToArray();
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExportMemorizedResponses()/ImportMemorizedResponses() access the mutable _memorizedResponses dictionary without taking the model’s _gate lock. Because Train() mutates _memorizedResponses under the lock, callers like BitNetPaperCheckpoint.Save() can hit a concurrent-modification exception or capture a torn snapshot if Save is invoked while training (or if an import happens while generating). Consider taking _gate inside these methods (or providing a dedicated snapshot API that acquires the lock) so checkpoint save/load is thread-safe relative to training/generation.

Suggested change
internal IReadOnlyDictionary<string, int[]> ExportMemorizedResponses() =>
_memorizedResponses.ToDictionary(
static pair => pair.Key,
static pair => pair.Value.ToArray(),
StringComparer.Ordinal);
internal void ImportMemorizedResponses(IReadOnlyDictionary<string, int[]> memorizedResponses)
{
ArgumentNullException.ThrowIfNull(memorizedResponses);
foreach (var pair in memorizedResponses)
{
_memorizedResponses[pair.Key] = pair.Value.ToArray();
internal IReadOnlyDictionary<string, int[]> ExportMemorizedResponses()
{
lock (_gate)
{
return _memorizedResponses.ToDictionary(
static pair => pair.Key,
static pair => pair.Value.ToArray(),
StringComparer.Ordinal);
}
}
internal void ImportMemorizedResponses(IReadOnlyDictionary<string, int[]> memorizedResponses)
{
ArgumentNullException.ThrowIfNull(memorizedResponses);
lock (_gate)
{
foreach (var pair in memorizedResponses)
{
_memorizedResponses[pair.Key] = pair.Value.ToArray();
}

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +49
model.ExportMemorizedResponses().ToDictionary(
static pair => pair.Key,
static pair => pair.Value.ToArray(),
StringComparer.Ordinal),
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Save() re-clones the memorized response values even though ExportMemorizedResponses() already returns a dictionary with copied arrays. This adds extra allocations during checkpoint save; consider serializing the ExportMemorizedResponses() result directly (or adjust ExportMemorizedResponses to return the serializable type you need) to avoid the redundant ToDictionary()/ToArray() pass.

Suggested change
model.ExportMemorizedResponses().ToDictionary(
static pair => pair.Key,
static pair => pair.Value.ToArray(),
StringComparer.Ordinal),
model.ExportMemorizedResponses(),

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4fd99a21a1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +187 to +188
if (nextToken.TokenId is var tokenId && (tokenId == _endTokenId || tokenId == _tokenToId[BitNetTokenizer.UnknownToken]))
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prevent generation from stopping on <unk> logits

In the non-memorized path, generation aborts when the top token is <unk>, so prompts can return only the fallback "BitNet paper model is ready." even though normal tokens are available. This is a regression from the previous ranking behavior, which explicitly filtered special tokens. For unmemorized prompts, skip <unk> during selection (and only allow <eos> after at least one emitted token) so argmax over special tokens does not terminate output prematurely.

Useful? React with 👍 / 👎.

Comment on lines +262 to +266
internal IReadOnlyDictionary<string, int[]> ExportMemorizedResponses() =>
_memorizedResponses.ToDictionary(
static pair => pair.Key,
static pair => pair.Value.ToArray(),
StringComparer.Ordinal);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard memorized-response export with _gate

ExportMemorizedResponses() enumerates _memorizedResponses without locking, while Train() mutates that dictionary under _gate. If BitNetPaperCheckpoint.Save() runs concurrently with training, this can throw a collection-modified exception or write an inconsistent snapshot. Take the same lock when exporting/importing memorized responses to keep checkpoint operations thread-safe.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants