Skip to content

fix(handler): return error response when SGLang requests fail with non-200 status#32

Open
sainarne15 wants to merge 1 commit into
runpod-workers:mainfrom
sainarne15:fix/handler-http-status-check
Open

fix(handler): return error response when SGLang requests fail with non-200 status#32
sainarne15 wants to merge 1 commit into
runpod-workers:mainfrom
sainarne15:fix/handler-http-status-check

Conversation

@sainarne15
Copy link
Copy Markdown

Summary

async_handler has three request paths (explicit OpenAI route, chat/completions
shorthand, and native /generate). Case 3 guards against non-200 responses before
yielding; Cases 1 and 2 do not, passing whatever SGLang returns directly into
process_response or iter_lines.

In failure scenarios — misconfigured MODEL_NAME, OOM crashes, unsupported
parameters — this causes SGLang's error payload to be streamed back as valid
response chunks. The caller receives no error key and no status indication,
making the failure mode indistinguishable from legitimate model output at the
application layer.

Changes

Added the same status_code guard to Cases 1 and 2 that Case 3 already employs.
Error messages are intentionally scoped per route ("Request failed..." for OpenAI
routes vs "Generate request failed..." for /generate) to preserve endpoint
context in the error response.

Testing

  • docker build --platform linux/amd64 -t worker-sglang-test . passes
  • Syntax validated inside the built container
  • Reproduced failure mode by simulating non-200 responses — confirmed raw error
    payload is surfaced as model output without the guard, and correctly returned
    as a structured error with it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant