Context / Background
I am using the Qwen3.5-Omni-Flash-Realtime model over a WebSocket connection. I reviewed the sample implementation provided in your repository at:
https://github.com/aliyun/alibabacloud-bailian-speech-demo/blob/master/samples/conversation/omni/python/run_server_vad.py
Problem 1: Invalid session.finish event & Missing Graceful Shutdown
When attempting to gracefully shut down the WebSocket connection, I tried sending a session.finish event type, same as the sdk of Realtime API specifications:
{
"event_id": "...",
"type": "session.finish"
}
However, the server returns an error stating that this is an invalid event type.
Looking closely at your sample code (run_server_vad.py), it only calls conversation.close(), which internally just executes ws.close(). This abruptly terminates the WebSocket connection without a proper higher-level elegant handshake or session termination event.
Problem 2: Dashboard showing "Failed" sessions & Token usage tracking
Because the connection is abruptly killed via ws.close(), every single one of my Realtime API sessions is marked as "Failed" in my Qwen Cloud Console / DashScope Dashboard. This makes it extremely difficult to track my exact token usage, session histories, and accurate analytics.
- Is the rejection of
session.finish expected behavior, or is there another protocol-compliant event to gracefully close a session?
- Is there a bug on the server side that records abrupt
ws.close() actions as session failures?
Question 3: VAD Token Consumption
Additionally, I have a question regarding the Voice Activity Detection (VAD) mode.
When VAD is enabled, does the server consume any tokens during the idle state (i.e., when the client is streaming background silence/noise before the actual speech or user message is triggered/recognized)?
Environment
- Model: Qwen3.5-Omni-Flash-Realtime
Any clarification or guidance on how to properly perform a graceful shutdown and understand VAD token mechanics would be greatly appreciated. Thank you!
Context / Background
I am using the Qwen3.5-Omni-Flash-Realtime model over a WebSocket connection. I reviewed the sample implementation provided in your repository at:
https://github.com/aliyun/alibabacloud-bailian-speech-demo/blob/master/samples/conversation/omni/python/run_server_vad.pyProblem 1: Invalid
session.finishevent & Missing Graceful ShutdownWhen attempting to gracefully shut down the WebSocket connection, I tried sending a
session.finishevent type, same as the sdk of Realtime API specifications:{ "event_id": "...", "type": "session.finish" }However, the server returns an error stating that this is an invalid event type.
Looking closely at your sample code (
run_server_vad.py), it only callsconversation.close(), which internally just executesws.close(). This abruptly terminates the WebSocket connection without a proper higher-level elegant handshake or session termination event.Problem 2: Dashboard showing "Failed" sessions & Token usage tracking
Because the connection is abruptly killed via
ws.close(), every single one of my Realtime API sessions is marked as "Failed" in my Qwen Cloud Console / DashScope Dashboard. This makes it extremely difficult to track my exact token usage, session histories, and accurate analytics.session.finishexpected behavior, or is there another protocol-compliant event to gracefully close a session?ws.close()actions as session failures?Question 3: VAD Token Consumption
Additionally, I have a question regarding the Voice Activity Detection (VAD) mode.
When VAD is enabled, does the server consume any tokens during the idle state (i.e., when the client is streaming background silence/noise before the actual speech or user message is triggered/recognized)?
Environment
Any clarification or guidance on how to properly perform a graceful shutdown and understand VAD token mechanics would be greatly appreciated. Thank you!