Fixes Step 3.7 Flash, Nemotron 3 Ultra and Kimi k2.6 (and other models) silently returning empty responses in Continue.
Root Cause: Step 3.7 Flash, Nemotron 3 Ultra and Kimi k2.6 on NVIDIA NIM runs with speculative decoding and includes a usage field on every streaming chunk. Continue's OpenAI provider interprets any chunk containing usage as the final chunk and stops — discarding all content silently, no error shown.
Sits between Continue and NIM, fixing things per request:
- Strips
min_pfrom outgoing requests (causes silent HTTP 400) - Strips
usagefrom content chunks in the streaming response (causes silent empty reply) - Strips
reasoning/reasoning_contentchunks (they had empty content) - Preserving
tool_callschunks so Continue can execute tools - Forward almost real-time Streaming
- Python 3.x (only standard libraries — tested with python 3.14)
- NVIDIA NIM API key
0. Download the proxy
nim_proxy.py
1. Setup the port you want to use
open nim_proxy.py and change the port by replacing the default LISTEN_PORT = 7606 with whichever port you want to use (make sure it is not occupied by something else).
2. Run the proxy (keep this terminal open while using Step 3.7 Flash, Nemotron 3 Ultra and Kimi k2.6 in Continue)
# source your venv if you use one, then run
python nim_proxy.py3. Point Continue to the proxy in your config.yaml, here an example of configuration (pay attention to apiBase):
models:
- name: Step-3.7-Flash
provider: openai
model: stepfun-ai/step-3.7-flash
apiBase: http://localhost:7606 # important: proxy instead of https://integrate.api.nvidia.com/v1
apiKey: your-nim-key-here
roles: [chat, edit, apply, summarize]
capabilities: capabilities: [tool_use, image_input]
defaultCompletionOptions:
temperature: 0.7
top_p: 0.95
top_k: 35
contextLength: 262144
maxTokens: 16384
chatOptions:
baseSystemMessage: |
You are an expert ... # enter your system prompt here
baseAgentSystemMessage: |
You are an expert ... # enter your system prompt here
basePlanSystemMessage: |
You are an expert ... # enter your system prompt here- Developer: Johannes Faber — fais.udder466@passinbox.com
- Hub-Website: https://fai-solutions.github.io/
- Issues: https://github.com/FAI-Solutions/Continue-NIM-Proxy/issues
This repository contains a practical workaround for step-3.7-flash, Nemotron 3 Ultra and Kimi k2.6 empty response, Continue VSCode no reply, stepfun-ai step-3.7-flash not working Continue, min_p speculative decoding HTTP 400 NIM and related NIM / Continue integration issues. The "solution" nim_proxy.py acts as a proxy between NIM and Continue, rewriting the Step 3.7 Flash stream into a Continue-compatible format so the model works again in VSCode / VSCodium. Keep the proxy running while using Step 3.7 Flash, Nemotron 3 Ultra and Kimi k2.6; it can remain active alongside other models.