Skip to content

FAI-Solutions/Continue-NIM-Proxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 

Repository files navigation

NIM Proxy prevents empty reponses in Continue + VSCode/VSCodium

Fixes Step 3.7 Flash, Nemotron 3 Ultra and Kimi k2.6 (and other models) silently returning empty responses in Continue.

Root Cause: Step 3.7 Flash, Nemotron 3 Ultra and Kimi k2.6 on NVIDIA NIM runs with speculative decoding and includes a usage field on every streaming chunk. Continue's OpenAI provider interprets any chunk containing usage as the final chunk and stops — discarding all content silently, no error shown.


Overview of the Proxy:

Sits between Continue and NIM, fixing things per request:

  1. Strips min_p from outgoing requests (causes silent HTTP 400)
  2. Strips usage from content chunks in the streaming response (causes silent empty reply)
  3. Strips reasoning/reasoning_content chunks (they had empty content)
  4. Preserving tool_calls chunks so Continue can execute tools
  5. Forward almost real-time Streaming

Requirements

  • Python 3.x (only standard libraries — tested with python 3.14)
  • NVIDIA NIM API key

Setup

0. Download the proxy

nim_proxy.py

1. Setup the port you want to use

open nim_proxy.py and change the port by replacing the default LISTEN_PORT = 7606 with whichever port you want to use (make sure it is not occupied by something else).

2. Run the proxy (keep this terminal open while using Step 3.7 Flash, Nemotron 3 Ultra and Kimi k2.6 in Continue)

# source your venv if you use one, then run
python nim_proxy.py

3. Point Continue to the proxy in your config.yaml, here an example of configuration (pay attention to apiBase):

models:
  - name: Step-3.7-Flash
    provider: openai
    model: stepfun-ai/step-3.7-flash
    apiBase: http://localhost:7606   # important: proxy instead of https://integrate.api.nvidia.com/v1
    apiKey: your-nim-key-here
    roles: [chat, edit, apply, summarize]
    capabilities: capabilities: [tool_use, image_input]
    defaultCompletionOptions:
      temperature: 0.7
      top_p: 0.95
      top_k: 35
      contextLength: 262144
      maxTokens: 16384
    chatOptions:
      baseSystemMessage: |
        You are an expert ... # enter your system prompt here
      baseAgentSystemMessage: |
        You are an expert ... # enter your system prompt here
      basePlanSystemMessage: |
        You are an expert ... # enter your system prompt here

Contact

License

MIT

Summary

This repository contains a practical workaround for step-3.7-flash, Nemotron 3 Ultra and Kimi k2.6 empty response, Continue VSCode no reply, stepfun-ai step-3.7-flash not working Continue, min_p speculative decoding HTTP 400 NIM and related NIM / Continue integration issues. The "solution" nim_proxy.py acts as a proxy between NIM and Continue, rewriting the Step 3.7 Flash stream into a Continue-compatible format so the model works again in VSCode / VSCodium. Keep the proxy running while using Step 3.7 Flash, Nemotron 3 Ultra and Kimi k2.6; it can remain active alongside other models.

About

Fixes Step 3.7 Flash, Nemotron 3 Ultra and Kimi k2.6 (NIM Endpoint) silently returning empty responses in Continue.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages