feat: add epoll gateway architecture prototype by div0rce · Pull Request #98 · div0rce/quant-systems-lab

div0rce · 2026-06-02T17:25:29Z

Summary

Adds EpollServer, a Linux-only event-driven TCP gateway transport using one epoll loop, nonblocking accept4/read/write, per-client outbound buffers, and one existing deterministic Session per connection.
Adds qsl-gateway <port> --epoll as an explicit Linux opt-in; the portable blocking TcpServer remains the default.
Adds platform-scoped epoll tests: unsupported-platform check on non-Linux, plus Linux loopback coverage for invalid bind hosts and two simultaneous clients served by one event loop.
Updates socket docs and adds ADR 0010, distinguishing M34 architecture validation from M35 load/socket-pressure evidence.

Scope

M34 only. This does not start M35, does not add load/capacity claims, and does not change matching, risk, protocol codec, or deterministic Session semantics.

Verification

make check - 190/190 passed locally
make asan - 190/190 passed locally
git diff --check
Docker Ubuntu Linux: built qsl-gateway and test_epoll_gateway; test_epoll_gateway passed 3 tests / 36 assertions

Notes

epoll remains Linux-only. The M30 socket-profile artifacts still describe the blocking gateway path; M35 owns multi-client load and socket-pressure measurements.

div0rce · 2026-06-02T17:26:12Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 22784d9cd2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Addresses the two Codex P2 reviews on PR #98. 1. epoll backpressure / unbounded memory: a client that kept sending valid frames but stopped reading left EPOLLIN armed, so the loop kept reading and appended responses to an uncapped outbuf -> unbounded per-connection memory. EpollServerOptions gains a max_outbuf_bytes high-water mark (default 1 MiB); the read loop stops at the mark and the re-arm logic drops EPOLLIN while the backlog is over it (keeping EPOLLOUT), resuming reads once it drains. This bounds per-connection memory and never drops or reorders a response. Adds a backpressure test. 2. --epoll without a port: `qsl-gateway --epoll` parsed `--epoll` via std::stoul and aborted with std::invalid_argument. The arg parser now recognizes flags first; the first non-flag arg is the port (default 9009), and a non-numeric port gives a clean usage error instead of an uncaught exception. docs/socket_gateway.md documents the high-water backpressure and the flag-or-port parsing. make check 190/190 (macOS); the Linux epoll paths are exercised by the Ubuntu CI build/tests.

div0rce · 2026-06-03T13:45:36Z

Fixed both P2s in 29d5af8.

Backpressure (epoll_server.cpp): added EpollServerOptions::max_outbuf_bytes (default 1 MiB). The read loop stops at the mark and the re-arm logic drops EPOLLIN while the backlog is over it (keeping EPOLLOUT), resuming reads once it drains — so a client that keeps sending but stops reading can no longer grow the outbound buffer unbounded. Added a backpressure test.
--epoll parsing (qsl-gateway/main.cpp): flag-aware arg loop — --epoll may precede or replace the port; the first non-flag arg is the port (default 9009); a non-numeric port yields a clean usage error instead of an uncaught std::invalid_argument.

docs/socket_gateway.md updated for both. make check 190/190 (macOS); Linux epoll paths covered by Ubuntu CI.

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 29d5af83c8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Addresses the second round of Codex P2 reviews on PR #98. 1. Port parsing (qsl-gateway/main.cpp): std::stoul accepted prefixes and the uint16_t cast truncated, so `9009x` silently bound 9009 and `70000` bound 4464. The parser now requires the whole token to parse (checks the consumed length) and the value to fit in uint16_t, otherwise it prints a usage error and exits 2. 2. epoll high-water mark (epoll_server.cpp): the mark was checked only after appending a full on_bytes response, so a single crossing/market order against a deep book (one Fill per resting maker) could push the buffer well past the mark before EPOLLIN was dropped. The check now runs before reading/processing the next request, bounding the overshoot to the request already in flight. A single response is still buffered whole -- capping one response would mean capping matching output, which is out of scope for the transport prototype -- and the header comment and docs/socket_gateway.md now state this honestly: the mark bounds how many further requests are read, not the size of one response. Verified: make check 190/190 (macOS); Docker Ubuntu built test_epoll_gateway + qsl-gateway, 4 epoll tests / 180 assertions pass, and arg smokes confirm 9009x / 70000 / abc are rejected while --epoll and a bare port start cleanly.

…connections Addresses the re-raised Codex P2 on PR #98. The soft high-water mark only gated reading further requests, so a single high-fanout response -- a market order sweeping a deep book returns one Fill per resting maker -- was appended whole before any check, meaning the 1 MiB limit was not actually a per-connection cap. Adds a hard cap (EpollServerOptions::max_outbuf_hard_bytes, default 8 MiB). Before appending a response, if it would push the connection's pending bytes past the hard cap, the connection is dropped instead of buffered, so sustained per-connection memory never exceeds the cap -- including on the high-fanout single-response path. The soft mark (max_outbuf_bytes, 1 MiB) still applies polite read-backpressure for the many-small-requests case; a client that reads its responses keeps the backlog near zero and trips neither threshold. Only a peer that stops reading and then induces an over-cap response is disconnected. Adds a test (deep book + one crossing order without reading + a tiny hard cap) asserting the server drops the connection (clean EOF) rather than buffering the over-cap response. Updated the header comments and docs/socket_gateway.md to the two-threshold design. Verified: make check 190/190 (macOS); Docker Ubuntu test_epoll_gateway 5 tests / 226 assertions.

div0rce · 2026-06-03T14:25:58Z

Re-raised P2 (high-water not a real per-connection cap) addressed in b897bfa.

The soft mark only gated reading more requests, so one high-fanout response (a market order sweeping a deep book → one Fill per maker) was appended whole. Added a hard cap (EpollServerOptions::max_outbuf_hard_bytes, default 8 MiB): before appending, if a response would push the connection past the hard cap it is dropped instead of buffered, so sustained per-connection memory never exceeds the cap — including the high-fanout single-response path. The soft mark (1 MiB) still does polite read-backpressure for many-small-requests; a reading client trips neither. New test builds a deep book, fires one crossing order without reading, and asserts the server drops the connection (clean EOF) rather than buffering the over-cap response.

Verified on Linux (Docker): test_epoll_gateway 5 tests / 226 assertions pass; make check 190/190.

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b897bfadea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-03T14:33:43Z

+        if (n < 0 && is_would_block()) {
+            return true;
+        }
+        return false;


Retry sends that are interrupted by signals

In environments where the process receives a signal while flushing a client's queued responses, send() can fail with EINTR; this falls through to false, closes the connection, and drops whatever remains in outbuf. The accept and read paths already treat EINTR as retryable, so the write path should do the same instead of disconnecting clients because of an unrelated signal.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-03T14:33:43Z

+            std::cerr << "usage: qsl-gateway [port] [--epoll]   (port is 0-65535)\n";
+            return 2;
+        }
+        port = static_cast<std::uint16_t>(value);


Reject duplicate port arguments

When the command line contains more than one numeric non-flag token, the loop silently overwrites the earlier port even though the usage and comment describe a single optional port and the first non-flag argument as the port. For example, qsl-gateway 9009 9010 --epoll binds 9010 instead of rejecting the malformed invocation, which can make local experiments listen on an unintended port.

Useful? React with 👍 / 👎.

Addresses the round-4 Codex review on PR #98 (transport-layer findings). - Keep serving after transient accept4 errors (ECONNABORTED and the pending network errors Linux reports through accept) instead of returning false and tearing down the whole event loop for all clients (accept(2)). - Flush output with a write offset instead of erasing from the front of outbuf after every send, which was O(n^2) for large buffers and could stall the single loop; reclaim the sent prefix once per append (amortized). Also retry send() on EINTR rather than dropping the connection, matching the accept/read paths. - Reject duplicate/extra port arguments (e.g. `qsl-gateway 9009 9010`) instead of silently binding the last one. The remaining review point -- one request's response is materialized in full by Session::on_bytes before the hard cap is checked, so the *transient* allocation on a deep-book sweep is unbounded -- needs streaming/byte-budgeted response generation through the shared Session/gateway API (used by the blocking TCP transport too). That is a larger architectural change beyond this epoll-transport prototype; it is tracked as issue #99 and documented in docs/socket_gateway.md. The hard cap still bounds the *retained* per-connection buffer. Verified: make check 190/190 (macOS); Docker Ubuntu test_epoll_gateway 5 tests / 226 assertions; arg smokes confirm 9009 9010 / 9009x / 70000 rejected and --epoll starts cleanly.

div0rce · 2026-06-03T14:44:58Z

Addressed the round-4 review in 84cb486.

Fixed (transport layer):

Aborted accepts: accept4 transient errors (ECONNABORTED + the pending network errors Linux surfaces via accept) now continue instead of tearing down the whole event loop.
O(n²) flush: output is drained with a write offset (reclaiming the sent prefix once per append) instead of erase(begin) per send.
EINTR on send: send() retries on EINTR instead of dropping the connection (matching the accept/read paths).
Duplicate port (P3): 9009 9010 is now rejected instead of silently binding the last token.

Deliberately backlogged — issue #99: the transient allocation point. Session::on_bytes materializes a single request's whole response before the hard cap is checked, so a deep-book sweep still transiently allocates O(fan-out). Truly bounding that requires streaming/byte-budgeting response generation through the shared Session/gateway API (the blocking TCP transport uses it too) — a milestone-sized change beyond this epoll-transport prototype, not a transport fix. The hard cap still bounds the retained per-connection buffer. Documented in docs/socket_gateway.md.

Verified on Linux (Docker): test_epoll_gateway 5 tests / 226 assertions; arg smokes confirm the rejections; make check 190/190.

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 84cb486b6e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Addresses the round-5 Codex review on PR #98. - Drain a client's writable backlog (EPOLLOUT) before its read branch, so the hard-cap check sees an up-to-date pending() and a client that has resumed reading is not falsely dropped when pending + response would momentarily cross a cap that draining first would relieve. - Close on EPOLLERR/EPOLLHUP regardless of queued output: a dead socket's pending bytes can never be sent, so the previous `!want_write` condition kept re-arming it and woke the event loop repeatedly with an unsendable buffer. EPOLLRDHUP stays flush-then-close, since a half-closed peer can still receive queued responses. - PROGRESS.md: corrected the stale Docker epoll verification count (3 tests / 36 -> 5 / 226) and recorded the Codex hardening rounds + the issue #99 backlog. Verified: make check 190/190 (macOS); Docker Ubuntu test_epoll_gateway 5 tests / 226 assertions.

div0rce · 2026-06-03T15:03:49Z

Round-5 review addressed in 9ff6c56.

Drain before read: the EPOLLOUT flush now runs before the read branch, so the hard-cap check sees current pending() — a client that resumed reading is no longer falsely dropped when pending + response momentarily crosses a cap that draining would relieve.
Close on hard error/hangup: EPOLLERR/EPOLLHUP now close regardless of queued output (a dead socket's bytes can't be sent; the old !want_write guard re-armed it and spun the loop). EPOLLRDHUP stays flush-then-close (a half-closed peer can still receive responses).
P3: corrected the stale PROGRESS epoll count (3/36 → 5/226).

Verified on Linux (Docker): test_epoll_gateway 5 tests / 226 assertions; make check 190/190.

The transient single-response allocation remains tracked as #99 (streaming response generation through the shared Session API — a separate milestone).

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9ff6c5614b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-03T15:09:45Z

+                        errno == ENETUNREACH) {
+                        continue;
+                    }
+                    return false; // genuinely fatal listener error


Close active clients before fatal accept failure

If accept4 hits a fatal listener error after clients are already connected (for example EMFILE/ENFILE, which are not in the transient-error list), this return false exits serve_listen_socket without closing any descriptors stored in clients. Closing the epoll fd only unregisters interest; it does not close the client sockets, so a caller that logs the failure or restarts the server in-process leaks those connections and can worsen the fd-exhaustion condition that caused the failure.

Useful? React with 👍 / 👎.

div0rce · 2026-06-03T15:42:37Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 36e4182ea2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-03T15:49:14Z

+            const bool hard_socket_error = (ev & (EPOLLERR | EPOLLHUP)) != 0U;
+
+            if (hard_socket_error) {
+                close_now = true;


Drain readable data before honoring hangups

When a peer writes a complete request and then closes the socket, Linux can report EPOLLIN together with EPOLLHUP; this branch treats EPOLLHUP as an immediate hard error, so the event loop closes the fd before the read path below can process the already-delivered bytes. That makes epoll mode drop orders/heartbeats in a disconnect-after-write case that the existing blocking TcpServer::serve_connection would process on its final positive read(). Please drain EPOLLIN data first and only close after the read buffer is exhausted/EOF is observed.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-03T15:49:14Z

+                        if (status == SessionStatus::OutputLimitExceeded) {
+                            client.close_after_flush = true;
+                            break;


Close over-cap clients instead of waiting for a flush

When OutputLimitExceeded fires after earlier responses are already buffered, setting close_after_flush means the later re-arm keeps the fd registered for EPOLLOUT until those pending bytes are sent. For a client that has stopped reading, EPOLLOUT may never arrive, so the connection is not actually dropped after tripping the hard cap and can keep pinning its per-client buffer; treat the hard-cap path as an immediate close/discard instead of waiting for a flush.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-03T15:49:14Z

+        if (ensure_new_order_budget(gateway_, request.value, out, max_output_bytes) !=
+            SessionStatus::Ok) {
+            logged_out_ = true;
+            return SessionStatus::OutputLimitExceeded;
        }


Avoid previewing orders when no output cap is active

The legacy unbounded Session::on_bytes(input) now routes through this path with max_output_bytes == size_t::max, but every NewOrder still calls ensure_new_order_budget, which previews/fill-counts the same matching liquidity before gateway_.new_limit/new_market walks it again. On the default TCP/session path, large sweeps over many maker orders now do two full book traversals even though an unlimited caller cannot exceed an output cap; skip the preview when the cap is disabled/unlimited.

Useful? React with 👍 / 👎.

div0rce added 2 commits June 2, 2026 13:25

feat: add epoll gateway architecture prototype

01653a5

chore: update m34 pr state

22784d9

chatgpt-codex-connector Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread src/gateway/epoll_server.cpp Outdated

Comment thread apps/qsl-gateway/main.cpp Outdated

chatgpt-codex-connector Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread apps/qsl-gateway/main.cpp Outdated

Comment thread src/gateway/epoll_server.cpp Outdated

div0rce added 2 commits June 3, 2026 10:09

chatgpt-codex-connector Bot reviewed Jun 3, 2026

View reviewed changes

div0rce mentioned this pull request Jun 3, 2026

Bound/stream gateway response generation (Session::on_bytes materializes whole response) #99

Open

chatgpt-codex-connector Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread src/gateway/epoll_server.cpp Outdated

Comment thread PROGRESS.md Outdated

Comment thread src/gateway/epoll_server.cpp Outdated

chatgpt-codex-connector Bot reviewed Jun 3, 2026

View reviewed changes

fix: bound epoll response fanout before mutation

36e4182

chatgpt-codex-connector Bot reviewed Jun 3, 2026

View reviewed changes

Conversation

div0rce commented Jun 2, 2026

Summary

Scope

Verification

Notes

Uh oh!

div0rce commented Jun 2, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

div0rce commented Jun 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

div0rce commented Jun 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

div0rce commented Jun 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

div0rce commented Jun 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

div0rce commented Jun 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!