Skip to content

Commit 4adc633

Browse files
committed
feat(spec): add real-time job status extension specification
Define SSE and WebSocket bindings for real-time job status notifications. Includes per-job and per-queue subscriptions, event format for state changes and progress updates, connection management with heartbeats and graceful shutdown, and security considerations. Implementations MUST support SSE; WebSocket is RECOMMENDED.
1 parent 8f17ee3 commit 4adc633

1 file changed

Lines changed: 343 additions & 0 deletions

File tree

spec/ojs-realtime.md

Lines changed: 343 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,343 @@
1+
# OJS Real-Time Status — Extension Specification
2+
3+
| Field | Value |
4+
|-------------|---------------------------------------------|
5+
| **Title** | OJS Real-Time Job Status Updates |
6+
| **Version** | 0.1.0 |
7+
| **Status** | Experimental (Stage 0) |
8+
| **Maturity** | Experimental |
9+
| **Date** | 2025-07-15 |
10+
| **Layer** | 3 — Protocol Binding |
11+
| **Depends On** | ojs-core.md, ojs-events.md |
12+
13+
---
14+
15+
## Abstract
16+
17+
This extension defines how clients subscribe to real-time job status changes via **Server-Sent Events (SSE)** and **WebSocket** protocols. It eliminates the need for polling by pushing state-change notifications directly to connected clients.
18+
19+
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).
20+
21+
---
22+
23+
## Table of Contents
24+
25+
1. [Introduction and Motivation](#1-introduction-and-motivation)
26+
2. [Server-Sent Events (SSE) Binding](#2-server-sent-events-sse-binding)
27+
3. [WebSocket Binding](#3-websocket-binding)
28+
4. [Event Format](#4-event-format)
29+
5. [Connection Management](#5-connection-management)
30+
6. [Security Considerations](#6-security-considerations)
31+
7. [Conformance Requirements](#7-conformance-requirements)
32+
8. [Examples](#8-examples)
33+
34+
---
35+
36+
## 1. Introduction and Motivation
37+
38+
Polling-based status checks create unnecessary load on the server and introduce latency between state transitions and client awareness. Real-time push notifications solve both problems by delivering events to subscribed clients the moment a job's state changes.
39+
40+
**Rationale:** Background job systems frequently power user-facing workflows (file uploads, report generation, payment processing). Users expect immediate feedback when their job completes or fails. Without a standardized real-time mechanism, every OJS implementation invents its own, fragmenting the ecosystem and preventing portable dashboards and monitoring tools.
41+
42+
This extension provides two complementary transport bindings:
43+
44+
- **SSE** — Simple, HTTP-based, unidirectional streaming. Ideal for browser-based dashboards and monitoring tools. Built on standard HTTP infrastructure with automatic reconnection.
45+
- **WebSocket** — Full-duplex communication. Ideal for interactive applications that need to subscribe/unsubscribe dynamically and receive events with minimal overhead.
46+
47+
An implementation MAY support one or both bindings. If an implementation advertises real-time support in its manifest, it MUST support at least the SSE binding.
48+
49+
---
50+
51+
## 2. Server-Sent Events (SSE) Binding
52+
53+
### 2.1 Subscribe to Job Updates
54+
55+
```
56+
GET /ojs/v1/jobs/{id}/events
57+
Accept: text/event-stream
58+
```
59+
60+
The server MUST respond with `Content-Type: text/event-stream` and begin streaming events for the specified job.
61+
62+
**Rationale:** Per-job subscriptions enable lightweight, targeted monitoring. A client tracking a single job should not receive the full event firehose.
63+
64+
If the job does not exist, the server MUST respond with HTTP 404 and a standard OJS error body (not an SSE stream).
65+
66+
If the job is in a terminal state (`completed`, `cancelled`, `discarded`), the server SHOULD send a single `job.state_changed` event reflecting the current state and then close the stream.
67+
68+
**Rationale:** Terminal jobs will never produce further events. Sending the current state and closing prevents clients from holding idle connections.
69+
70+
### 2.2 Subscribe to Queue Events
71+
72+
```
73+
GET /ojs/v1/queues/{name}/events
74+
Accept: text/event-stream
75+
```
76+
77+
The server MUST respond with `Content-Type: text/event-stream` and begin streaming events for all jobs in the specified queue.
78+
79+
If the queue does not exist, the server MUST respond with HTTP 404.
80+
81+
### 2.3 SSE Protocol Requirements
82+
83+
The server MUST comply with the [W3C Server-Sent Events specification](https://html.spec.whatwg.org/multipage/server-sent-events.html):
84+
85+
- Each event MUST include an `event` field indicating the event type.
86+
- Each event MUST include a `data` field containing the JSON-encoded event payload.
87+
- Each event MUST include an `id` field containing a monotonically increasing event identifier.
88+
- Events SHOULD include a `retry` field (in milliseconds) to advise the client on reconnection interval. The RECOMMENDED default is `3000`.
89+
90+
**Rationale:** The `id` field enables automatic reconnection via the `Last-Event-ID` header, preventing event loss during transient disconnections.
91+
92+
### 2.4 Heartbeat
93+
94+
The server MUST send a comment line (`:heartbeat`) at least every **15 seconds** when no events are pending.
95+
96+
**Rationale:** SSE connections traverse proxies and load balancers that may close idle connections. Regular heartbeats prevent premature termination.
97+
98+
### 2.5 Reconnection
99+
100+
When a client reconnects with a `Last-Event-ID` header, the server SHOULD replay any events that occurred after the specified ID. If the server cannot replay (e.g., events were not retained), it MUST resume from the current point without error.
101+
102+
**Rationale:** At-least-once delivery during reconnection is critical for reliable monitoring. However, implementations are not required to maintain unbounded event history.
103+
104+
---
105+
106+
## 3. WebSocket Binding
107+
108+
### 3.1 Connection Endpoint
109+
110+
```
111+
WS /ojs/v1/ws
112+
```
113+
114+
The server MUST accept WebSocket upgrade requests at this path. The server MUST use the `ojs.v1` WebSocket subprotocol when offered by the client.
115+
116+
### 3.2 Subscribe Message
117+
118+
After connecting, a client subscribes to event channels by sending:
119+
120+
```json
121+
{
122+
"action": "subscribe",
123+
"channel": "job:{id}"
124+
}
125+
```
126+
127+
Supported channel patterns:
128+
129+
| Pattern | Description |
130+
|-----------------|--------------------------------------|
131+
| `job:{id}` | Events for a specific job |
132+
| `queue:{name}` | Events for all jobs in a queue |
133+
| `all` | All events (global firehose) |
134+
135+
The server MUST respond with an acknowledgment:
136+
137+
```json
138+
{
139+
"type": "subscribed",
140+
"channel": "job:01926f5e-..."
141+
}
142+
```
143+
144+
If the referenced job or queue does not exist, the server MUST respond with an error message:
145+
146+
```json
147+
{
148+
"type": "error",
149+
"code": "not_found",
150+
"message": "Job not found."
151+
}
152+
```
153+
154+
### 3.3 Unsubscribe Message
155+
156+
```json
157+
{
158+
"action": "unsubscribe",
159+
"channel": "job:{id}"
160+
}
161+
```
162+
163+
The server MUST respond with:
164+
165+
```json
166+
{
167+
"type": "unsubscribed",
168+
"channel": "job:{id}"
169+
}
170+
```
171+
172+
### 3.4 Event Messages
173+
174+
The server pushes events to subscribed clients using the same payload format as SSE (Section 4):
175+
176+
```json
177+
{
178+
"type": "event",
179+
"channel": "job:01926f5e-...",
180+
"event": "job.state_changed",
181+
"data": { ... },
182+
"id": "evt_0042",
183+
"timestamp": "2025-07-15T10:30:00.000Z"
184+
}
185+
```
186+
187+
### 3.5 Connection Health
188+
189+
The server MUST send WebSocket Ping frames at least every **30 seconds**. If a client fails to respond with a Pong within **10 seconds**, the server SHOULD close the connection.
190+
191+
**Rationale:** Ping/Pong ensures dead connections are detected promptly, freeing server resources.
192+
193+
---
194+
195+
## 4. Event Format
196+
197+
All real-time events use the following JSON structure:
198+
199+
### 4.1 `job.state_changed`
200+
201+
Emitted when a job transitions between lifecycle states.
202+
203+
```json
204+
{
205+
"job_id": "01926f5e-7a3c-7def-8000-111111111111",
206+
"queue": "default",
207+
"type": "email.send",
208+
"from": "active",
209+
"to": "completed",
210+
"timestamp": "2025-07-15T10:30:00.000Z"
211+
}
212+
```
213+
214+
| Field | Type | Required | Description |
215+
|-------------|--------|----------|---------------------------------------------------|
216+
| `job_id` | string | Yes | UUIDv7 of the job |
217+
| `queue` | string | Yes | Queue the job belongs to |
218+
| `type` | string | Yes | Job type |
219+
| `from` | string | Yes | Previous state (one of the 8 OJS lifecycle states) |
220+
| `to` | string | Yes | New state |
221+
| `timestamp` | string | Yes | RFC 3339 timestamp of the transition |
222+
223+
### 4.2 `job.progress`
224+
225+
Emitted when a worker reports progress on an active job.
226+
227+
```json
228+
{
229+
"job_id": "01926f5e-7a3c-7def-8000-111111111111",
230+
"progress": 75,
231+
"message": "Processing page 3 of 4",
232+
"timestamp": "2025-07-15T10:29:55.000Z"
233+
}
234+
```
235+
236+
| Field | Type | Required | Description |
237+
|-------------|---------|----------|----------------------------------------|
238+
| `job_id` | string | Yes | UUIDv7 of the job |
239+
| `progress` | integer | Yes | Percentage complete (0–100) |
240+
| `message` | string | No | Human-readable progress description |
241+
| `timestamp` | string | Yes | RFC 3339 timestamp |
242+
243+
---
244+
245+
## 5. Connection Management
246+
247+
### 5.1 Graceful Shutdown
248+
249+
When the server is shutting down, it MUST:
250+
251+
1. Stop accepting new SSE and WebSocket connections.
252+
2. Send a `server.shutdown` event to all connected clients.
253+
3. Close all connections within a RECOMMENDED grace period of **5 seconds**.
254+
255+
**Rationale:** Graceful shutdown prevents clients from hanging on dead connections and allows them to reconnect to another server instance.
256+
257+
### 5.2 Client Limits
258+
259+
The server SHOULD enforce a maximum number of concurrent real-time connections per client (identified by IP address or authentication token). The RECOMMENDED default limit is **100 connections**.
260+
261+
**Rationale:** Without connection limits, a single misbehaving client could exhaust server resources.
262+
263+
### 5.3 Event Buffering
264+
265+
The server MAY buffer recent events (RECOMMENDED: last 100 events per channel) to support SSE reconnection via `Last-Event-ID`.
266+
267+
---
268+
269+
## 6. Security Considerations
270+
271+
- Real-time endpoints SHOULD be subject to the same authentication and authorization mechanisms as other OJS API endpoints.
272+
- The server MUST NOT leak job data to unauthorized subscribers. If a client is not authorized to view a job, subscribe requests MUST be rejected with HTTP 403 (SSE) or an error message (WebSocket).
273+
- WebSocket connections SHOULD validate the `Origin` header to prevent cross-site WebSocket hijacking.
274+
275+
---
276+
277+
## 7. Conformance Requirements
278+
279+
An implementation claiming conformance to this extension:
280+
281+
- MUST support the SSE binding (Section 2).
282+
- MUST emit `job.state_changed` events for all lifecycle transitions.
283+
- SHOULD support the WebSocket binding (Section 3).
284+
- MAY support the `job.progress` event type.
285+
- MUST implement heartbeats as specified.
286+
- MUST handle graceful shutdown as specified (Section 5.1).
287+
288+
---
289+
290+
## 8. Examples
291+
292+
### 8.1 SSE — Monitoring a Single Job
293+
294+
**Request:**
295+
```http
296+
GET /ojs/v1/jobs/01926f5e-7a3c-7def-8000-111111111111/events HTTP/1.1
297+
Accept: text/event-stream
298+
```
299+
300+
**Response stream:**
301+
```
302+
retry: 3000
303+
304+
:heartbeat
305+
306+
id: evt_0001
307+
event: job.state_changed
308+
data: {"job_id":"01926f5e-7a3c-7def-8000-111111111111","queue":"default","type":"email.send","from":"available","to":"active","timestamp":"2025-07-15T10:30:00.000Z"}
309+
310+
:heartbeat
311+
312+
id: evt_0002
313+
event: job.state_changed
314+
data: {"job_id":"01926f5e-7a3c-7def-8000-111111111111","queue":"default","type":"email.send","from":"active","to":"completed","timestamp":"2025-07-15T10:30:05.000Z"}
315+
316+
```
317+
318+
### 8.2 WebSocket — Subscribe and Receive Events
319+
320+
**Client sends:**
321+
```json
322+
{"action":"subscribe","channel":"queue:default"}
323+
```
324+
325+
**Server responds:**
326+
```json
327+
{"type":"subscribed","channel":"queue:default"}
328+
```
329+
330+
**Server pushes event:**
331+
```json
332+
{"type":"event","channel":"queue:default","event":"job.state_changed","data":{"job_id":"01926f5e-7a3c-7def-8000-111111111111","queue":"default","type":"email.send","from":"available","to":"active","timestamp":"2025-07-15T10:30:00.000Z"},"id":"evt_0001","timestamp":"2025-07-15T10:30:00.000Z"}
333+
```
334+
335+
**Client unsubscribes:**
336+
```json
337+
{"action":"unsubscribe","channel":"queue:default"}
338+
```
339+
340+
**Server confirms:**
341+
```json
342+
{"type":"unsubscribed","channel":"queue:default"}
343+
```

0 commit comments

Comments
 (0)