Skip to content

[Issue/Question] Separation of reasoning content in API responses #30

@GY0330

Description

@GY0330

macMLX Version

latest version

Apple Silicon Chip

M4

macOS Version

macOS(M5)

Bug Description

Hello,

I am a user who is very excited about the macMLX project. I am writing to report a specific behavior regarding how reasoning/thinking content is handled in API responses.

Issue Description:
When connecting macMLX to an external agent (such as Hermes Agent) via API, the model's internal reasoning process is being exposed within the main content. Initially, I suspected an issue with the agent, but according to feedback from Claude Opus, it appears that macMLX might not be strictly separating the reasoning tokens from the content output.

Observations:

When running models via macMLX, the opening tag is often missing, while the reasoning text and the closing tag remain visible.

This prevents the agent from correctly filtering out the ~ block, causing the internal "chain of thought" to be displayed to the user.

Comparison:

MLX-LM / LM Studio: When using the exact same model and file in these environments, the separation works as expected, and the reasoning tags are handled correctly.

macMLX: The issue persists only in this environment.

Question:
Is there a specific configuration or setting I should adjust to ensure that reasoning content is properly flagged or separated in the API response?

I would appreciate any guidance or insights you could provide.

Thank you for your hard work on this project!

Steps to Reproduce

Connect a reasoning model (e.g., Qwen 3.6-35B-A3B) to Hermes Agent via API.

Logs (optional)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions