macMLX Version
latest version
Apple Silicon Chip
M4
macOS Version
macOS(M5)
Bug Description
Hello,
I am a user who is very excited about the macMLX project. I am writing to report a specific behavior regarding how reasoning/thinking content is handled in API responses.
Issue Description:
When connecting macMLX to an external agent (such as Hermes Agent) via API, the model's internal reasoning process is being exposed within the main content. Initially, I suspected an issue with the agent, but according to feedback from Claude Opus, it appears that macMLX might not be strictly separating the reasoning tokens from the content output.
Observations:
When running models via macMLX, the opening tag is often missing, while the reasoning text and the closing tag remain visible.
This prevents the agent from correctly filtering out the ~ block, causing the internal "chain of thought" to be displayed to the user.
Comparison:
MLX-LM / LM Studio: When using the exact same model and file in these environments, the separation works as expected, and the reasoning tags are handled correctly.
macMLX: The issue persists only in this environment.
Question:
Is there a specific configuration or setting I should adjust to ensure that reasoning content is properly flagged or separated in the API response?
I would appreciate any guidance or insights you could provide.
Thank you for your hard work on this project!
Steps to Reproduce
Connect a reasoning model (e.g., Qwen 3.6-35B-A3B) to Hermes Agent via API.
Logs (optional)
macMLX Version
latest version
Apple Silicon Chip
M4
macOS Version
macOS(M5)
Bug Description
Hello,
I am a user who is very excited about the macMLX project. I am writing to report a specific behavior regarding how reasoning/thinking content is handled in API responses.
Issue Description:
When connecting macMLX to an external agent (such as Hermes Agent) via API, the model's internal reasoning process is being exposed within the main content. Initially, I suspected an issue with the agent, but according to feedback from Claude Opus, it appears that macMLX might not be strictly separating the reasoning tokens from the content output.
Observations:
When running models via macMLX, the opening tag is often missing, while the reasoning text and the closing tag remain visible.
This prevents the agent from correctly filtering out the ~ block, causing the internal "chain of thought" to be displayed to the user.
Comparison:
MLX-LM / LM Studio: When using the exact same model and file in these environments, the separation works as expected, and the reasoning tags are handled correctly.
macMLX: The issue persists only in this environment.
Question:
Is there a specific configuration or setting I should adjust to ensure that reasoning content is properly flagged or separated in the API response?
I would appreciate any guidance or insights you could provide.
Thank you for your hard work on this project!
Steps to Reproduce
Connect a reasoning model (e.g., Qwen 3.6-35B-A3B) to Hermes Agent via API.
Logs (optional)