feat: add logprobs support to /chat/completions by eloe · Pull Request #17 · eloe/mlx-vlm

eloe · 2026-04-06T16:16:29Z

Summary\nPer-token log probabilities when logprobs=True. OpenAI format. 4 tests.

Return per-token log probabilities when logprobs=True is set in the request. Each token includes the decoded text, its log probability, and UTF-8 byte representation matching the OpenAI format. When logprobs is requested in non-streaming mode, uses stream_generate internally to collect per-token probabilities. Adds: logprobs/top_logprobs fields on ChatRequest, TokenLogprob and ChoiceLogprobs models, logprobs on ChatChoice/ChatStreamChoice. 4 new tests: present when requested, absent by default, format validation, streaming logprobs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds OpenAI-style per-token log probability support to the /chat/completions endpoint when logprobs=True, including streaming behavior and accompanying tests.

Changes:

Extend /chat/completions request/response schemas to include logprobs (and top_logprobs in the request).
Emit per-token logprobs.content entries in both streaming and non-streaming responses.
Add new server tests covering presence/absence, format, and streaming logprobs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`mlx_vlm/server.py`	Adds request/response models for logprobs and emits per-token logprob objects in streaming and non-streaming `/chat/completions`.
`mlx_vlm/tests/test_server.py`	Adds tests validating logprobs are returned when requested, omitted by default, and present in streaming chunks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-06T16:35:42Z

+    top_logprobs: Optional[int] = Field(
+        None,
+        ge=0,
+        le=20,
+        description="Number of most likely tokens to return at each position.",
+    )


top_logprobs is added to ChatRequest but is never read in the handler, so the API will accept it and silently ignore it. Either implement top_logprobs in the /chat/completions response (include the top-N alternatives per token) or remove the field until it’s supported; also consider validating that top_logprobs can only be set when logprobs=True to match the OpenAI request contract.

Suggested change

top_logprobs: Optional[int] = Field(

None,

ge=0,

le=20,

description="Number of most likely tokens to return at each position.",

)

Copilot · 2026-04-06T16:35:43Z

+                        chunk_logprobs = None
+                        if want_logprobs and chunk.token is not None and chunk.logprobs is not None:
+                            token_text = tokenizer.decode([chunk.token])
+                            chosen_logprob = float(chunk.logprobs[chunk.token])
+                            chunk_logprobs = ChoiceLogprobs(
+                                content=[
+                                    TokenLogprob(
+                                        token=token_text,
+                                        logprob=chosen_logprob,
+                                        bytes=list(token_text.encode("utf-8")),
+                                    )
+                                ]


stream_generate() always yields a final “flush” chunk after detokenizer.finalize() that carries the last token/logprobs even when chunk.text is empty. The current logic will attach logprobs for that final chunk as well, which can leak EOS/special-token logprobs into the streamed response. Consider skipping logprob emission for special tokens (e.g., tokens in tokenizer.all_special_ids / eos_token_id) so the logprobs.content aligns with actual assistant content tokens.

Copilot · 2026-04-06T16:35:43Z

+                        full_text += chunk.text
+                        if chunk.token is not None and chunk.logprobs is not None:
+                            token_text = tokenizer.decode([chunk.token])
+                            chosen_logprob = float(chunk.logprobs[chunk.token])
+                            token_logprobs.append(
+                                TokenLogprob(
+                                    token=token_text,
+                                    logprob=chosen_logprob,
+                                    bytes=list(token_text.encode("utf-8")),
+                                )
+                            )


When collecting non-streaming token_logprobs, this will also include the final “flush” chunk from stream_generate() (post-finalize()), which may correspond to EOS/special tokens rather than user-visible text. Filter out special tokens (or otherwise exclude the terminal flush chunk) so returned logprobs.content matches the tokens that form message.content.

eloe requested a review from Copilot April 6, 2026 16:28

Copilot started reviewing on behalf of eloe April 6, 2026 16:28 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

eloe mentioned this pull request Apr 8, 2026

Combined server enhancements: OpenAI API compliance, prompt caching, concurrency #21

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add logprobs support to /chat/completions#17

feat: add logprobs support to /chat/completions#17
eloe wants to merge 1 commit into
mainfrom
feature/logprobs

eloe commented Apr 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eloe commented Apr 6, 2026

Summary\nPer-token log probabilities when logprobs=True. OpenAI format. 4 tests.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants