Skip to content

WebSocket Protocol

scarecr0w12 edited this page Jun 20, 2026 · 3 revisions

WebSocket Protocol

The CortexPrism WebSocket provides real-time streaming chat, audio communication, file upload, and tool call reasoning inspection.

Connection

ws://127.0.0.1:3000/ws          # Client WebSocket
ws://127.0.0.1:3000/ws/node     # Node WebSocket (Hub ↔ Node)

Authentication: when webAuth.requireAuth is enabled, the /ws endpoint checks session cookies before upgrading connections.

Client → Server Messages

{ "type": "chat", "message": "Hello", "sessionId": "sess_abc123", "files": [...] }
{ "type": "ping" }
{ "type": "new_session" }
{ "type": "select_agent", "agentId": "agent-1" }
{ "type": "audio_chunk", "data": "<base64>" }
{ "type": "audio_end" }
{ "type": "speak", "text": "Hello world" }

Chat Message Fields

Field Type Required Description
type "chat" Yes Message type
message string Yes User message text
sessionId string No Resume existing session
files array No Uploaded files [{filename, mimeType, data (base64)}]

File Upload

Files are received as base64 over WebSocket alongside chat messages. They are saved to both the working directory and agent workspace for tool access. PDFs get text auto-extracted. Images are included as multimodal content blocks for supported providers (Anthropic, Google Gemini). For text-only providers, a note is appended suggesting a provider switch.

Server → Client Messages

{ "type": "connected" }
{ "type": "session", "sessionId": "sess_abc123" }
{ "type": "start" }
{ "type": "chunk", "delta": "Hello" }
{ "type": "reasoning", "content": "Agent is considering..." }
{ "type": "tool_call", "tool": "web_search", "args": {"query": "..."} }
{ "type": "tool_result", "tool": "web_search", "result": "..." }
{ "type": "done", "tokensIn": 100, "tokensOut": 50, "costUsd": 0.001, "durationMs": 800 }
{ "type": "error", "error": "Something went wrong" }
{ "type": "pong" }
{ "type": "audio", "data": "<base64 mp3>", "format": "mp3" }
{ "type": "voice_state", "listening": true, "enabled": true }
{ "type": "file_change", "path": "/workspace/file.ts" }

Done Message Fields

Field Type Description
tokensIn number Input tokens used
tokensOut number Output tokens generated
costUsd number Estimated cost in USD
durationMs number Total turn duration
modelMode 'manual' | 'auto' Model selection mode for this turn (v0.46+)
requestedModelMode string Requested mode (matches modelMode)
resolvedProvider string LLM provider used (v0.46+)
resolvedModel string LLM model used (v0.46+)
autoFallback boolean Whether Auto mode fell back to heuristic (v0.46+)
autoFallbackReason string Reason for fallback: 'mqm_disabled', 'low_confidence', etc. (v0.46+)

Reasoning Message

The reasoning message type delivers the agent's internal decision-making process (tool selection rationale, task assessment) as a separate stream. In the Web UI, this appears in a collapsible panel toggled by a 🔬 Reasoning button that shows during tool use and auto-hides when the response completes.

Voice/Audio Messages

Client → Server:

{ "type": "audio_chunk", "data": "<base64>" }
{ "type": "audio_end" }

Server → Client:

{ "type": "speak", "text": "...", "voice": "alloy" }
{ "type": "audio", "data": "<base64>", "format": "mp3" }
{ "type": "voice_state", "listening": true, "enabled": true }

Transcribed speech is dispatched directly into the agent loop as a user message. Auto-TTS synthesizes agent responses to audio before the done signal.

Session Resume

Include an existing sessionId in a chat message to resume across WebSocket reconnects and page reloads:

{ "type": "chat", "message": "Continue our conversation", "sessionId": "sess_abc123" }

The server reopens the per-session database, reactivates the session, and loads previous messages via loadHistory(). Session titles are displayed in the chat header.

Protocol Notes

  • Tool call XML (<tool_call>) and bare JSON are stripped from chunks using a brace-depth walker algorithm at both server and client side
  • Streaming is buffered internally when tools are registered; only clean prose reaches the client
  • Tool calls split across multiple WebSocket chunks are properly buffered and stripped
  • The file_change event broadcasts on file edits, renames, and deletes
  • WebSocket connections are upgraded from standard HTTP at /ws
  • Node WebSocket at /ws/node uses token-based registration with heartbeat/ACK protocol

See Also

Clone this wiki locally