Skip to content

feat: add epoll gateway architecture prototype#98

Draft
div0rce wants to merge 8 commits into
mainfrom
feat/m34-epoll-gateway-architecture
Draft

feat: add epoll gateway architecture prototype#98
div0rce wants to merge 8 commits into
mainfrom
feat/m34-epoll-gateway-architecture

Conversation

@div0rce
Copy link
Copy Markdown
Owner

@div0rce div0rce commented Jun 2, 2026

Summary

  • Adds EpollServer, a Linux-only event-driven TCP gateway transport using one epoll loop, nonblocking accept4/read/write, per-client outbound buffers, and one existing deterministic Session per connection.
  • Adds qsl-gateway <port> --epoll as an explicit Linux opt-in; the portable blocking TcpServer remains the default.
  • Adds platform-scoped epoll tests: unsupported-platform check on non-Linux, plus Linux loopback coverage for invalid bind hosts and two simultaneous clients served by one event loop.
  • Updates socket docs and adds ADR 0010, distinguishing M34 architecture validation from M35 load/socket-pressure evidence.

Scope

M34 only. This does not start M35, does not add load/capacity claims, and does not change matching, risk, protocol codec, or deterministic Session semantics.

Verification

  • make check - 190/190 passed locally
  • make asan - 190/190 passed locally
  • git diff --check
  • Docker Ubuntu Linux: built qsl-gateway and test_epoll_gateway; test_epoll_gateway passed 3 tests / 36 assertions

Notes

epoll remains Linux-only. The M30 socket-profile artifacts still describe the blocking gateway path; M35 owns multi-client load and socket-pressure measurements.

@div0rce
Copy link
Copy Markdown
Owner Author

div0rce commented Jun 2, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 22784d9cd2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gateway/epoll_server.cpp Outdated
Comment thread apps/qsl-gateway/main.cpp Outdated
Addresses the two Codex P2 reviews on PR #98.

1. epoll backpressure / unbounded memory: a client that kept sending valid frames but stopped
   reading left EPOLLIN armed, so the loop kept reading and appended responses to an uncapped
   outbuf -> unbounded per-connection memory. EpollServerOptions gains a max_outbuf_bytes
   high-water mark (default 1 MiB); the read loop stops at the mark and the re-arm logic drops
   EPOLLIN while the backlog is over it (keeping EPOLLOUT), resuming reads once it drains. This
   bounds per-connection memory and never drops or reorders a response. Adds a backpressure test.

2. --epoll without a port: `qsl-gateway --epoll` parsed `--epoll` via std::stoul and aborted with
   std::invalid_argument. The arg parser now recognizes flags first; the first non-flag arg is the
   port (default 9009), and a non-numeric port gives a clean usage error instead of an uncaught
   exception.

docs/socket_gateway.md documents the high-water backpressure and the flag-or-port parsing.
make check 190/190 (macOS); the Linux epoll paths are exercised by the Ubuntu CI build/tests.
@div0rce
Copy link
Copy Markdown
Owner Author

div0rce commented Jun 3, 2026

Fixed both P2s in 29d5af8.

  1. Backpressure (epoll_server.cpp): added EpollServerOptions::max_outbuf_bytes (default 1 MiB). The read loop stops at the mark and the re-arm logic drops EPOLLIN while the backlog is over it (keeping EPOLLOUT), resuming reads once it drains — so a client that keeps sending but stops reading can no longer grow the outbound buffer unbounded. Added a backpressure test.
  2. --epoll parsing (qsl-gateway/main.cpp): flag-aware arg loop — --epoll may precede or replace the port; the first non-flag arg is the port (default 9009); a non-numeric port yields a clean usage error instead of an uncaught std::invalid_argument.

docs/socket_gateway.md updated for both. make check 190/190 (macOS); Linux epoll paths covered by Ubuntu CI.

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 29d5af83c8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread apps/qsl-gateway/main.cpp Outdated
Comment thread src/gateway/epoll_server.cpp Outdated
div0rce added 2 commits June 3, 2026 10:09
Addresses the second round of Codex P2 reviews on PR #98.

1. Port parsing (qsl-gateway/main.cpp): std::stoul accepted prefixes and the uint16_t cast
   truncated, so `9009x` silently bound 9009 and `70000` bound 4464. The parser now requires the
   whole token to parse (checks the consumed length) and the value to fit in uint16_t, otherwise it
   prints a usage error and exits 2.

2. epoll high-water mark (epoll_server.cpp): the mark was checked only after appending a full
   on_bytes response, so a single crossing/market order against a deep book (one Fill per resting
   maker) could push the buffer well past the mark before EPOLLIN was dropped. The check now runs
   before reading/processing the next request, bounding the overshoot to the request already in
   flight. A single response is still buffered whole -- capping one response would mean capping
   matching output, which is out of scope for the transport prototype -- and the header comment and
   docs/socket_gateway.md now state this honestly: the mark bounds how many further requests are
   read, not the size of one response.

Verified: make check 190/190 (macOS); Docker Ubuntu built test_epoll_gateway + qsl-gateway, 4 epoll
tests / 180 assertions pass, and arg smokes confirm 9009x / 70000 / abc are rejected while --epoll
and a bare port start cleanly.
…connections

Addresses the re-raised Codex P2 on PR #98. The soft high-water mark only gated reading further
requests, so a single high-fanout response -- a market order sweeping a deep book returns one Fill
per resting maker -- was appended whole before any check, meaning the 1 MiB limit was not actually
a per-connection cap.

Adds a hard cap (EpollServerOptions::max_outbuf_hard_bytes, default 8 MiB). Before appending a
response, if it would push the connection's pending bytes past the hard cap, the connection is
dropped instead of buffered, so sustained per-connection memory never exceeds the cap -- including
on the high-fanout single-response path. The soft mark (max_outbuf_bytes, 1 MiB) still applies
polite read-backpressure for the many-small-requests case; a client that reads its responses keeps
the backlog near zero and trips neither threshold. Only a peer that stops reading and then induces
an over-cap response is disconnected.

Adds a test (deep book + one crossing order without reading + a tiny hard cap) asserting the server
drops the connection (clean EOF) rather than buffering the over-cap response. Updated the header
comments and docs/socket_gateway.md to the two-threshold design.

Verified: make check 190/190 (macOS); Docker Ubuntu test_epoll_gateway 5 tests / 226 assertions.
@div0rce
Copy link
Copy Markdown
Owner Author

div0rce commented Jun 3, 2026

Re-raised P2 (high-water not a real per-connection cap) addressed in b897bfa.

The soft mark only gated reading more requests, so one high-fanout response (a market order sweeping a deep book → one Fill per maker) was appended whole. Added a hard cap (EpollServerOptions::max_outbuf_hard_bytes, default 8 MiB): before appending, if a response would push the connection past the hard cap it is dropped instead of buffered, so sustained per-connection memory never exceeds the cap — including the high-fanout single-response path. The soft mark (1 MiB) still does polite read-backpressure for many-small-requests; a reading client trips neither. New test builds a deep book, fires one crossing order without reading, and asserts the server drops the connection (clean EOF) rather than buffering the over-cap response.

Verified on Linux (Docker): test_epoll_gateway 5 tests / 226 assertions pass; make check 190/190.

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b897bfadea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gateway/epoll_server.cpp Outdated
Comment thread src/gateway/epoll_server.cpp Outdated
Comment thread src/gateway/epoll_server.cpp Outdated
if (n < 0 && is_would_block()) {
return true;
}
return false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Retry sends that are interrupted by signals

In environments where the process receives a signal while flushing a client's queued responses, send() can fail with EINTR; this falls through to false, closes the connection, and drops whatever remains in outbuf. The accept and read paths already treat EINTR as retryable, so the write path should do the same instead of disconnecting clients because of an unrelated signal.

Useful? React with 👍 / 👎.

Comment thread apps/qsl-gateway/main.cpp
std::cerr << "usage: qsl-gateway [port] [--epoll] (port is 0-65535)\n";
return 2;
}
port = static_cast<std::uint16_t>(value);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Reject duplicate port arguments

When the command line contains more than one numeric non-flag token, the loop silently overwrites the earlier port even though the usage and comment describe a single optional port and the first non-flag argument as the port. For example, qsl-gateway 9009 9010 --epoll binds 9010 instead of rejecting the malformed invocation, which can make local experiments listen on an unintended port.

Useful? React with 👍 / 👎.

Addresses the round-4 Codex review on PR #98 (transport-layer findings).

- Keep serving after transient accept4 errors (ECONNABORTED and the pending network errors Linux
  reports through accept) instead of returning false and tearing down the whole event loop for all
  clients (accept(2)).
- Flush output with a write offset instead of erasing from the front of outbuf after every send,
  which was O(n^2) for large buffers and could stall the single loop; reclaim the sent prefix once
  per append (amortized). Also retry send() on EINTR rather than dropping the connection, matching
  the accept/read paths.
- Reject duplicate/extra port arguments (e.g. `qsl-gateway 9009 9010`) instead of silently binding
  the last one.

The remaining review point -- one request's response is materialized in full by Session::on_bytes
before the hard cap is checked, so the *transient* allocation on a deep-book sweep is unbounded --
needs streaming/byte-budgeted response generation through the shared Session/gateway API (used by
the blocking TCP transport too). That is a larger architectural change beyond this epoll-transport
prototype; it is tracked as issue #99 and documented in docs/socket_gateway.md. The hard cap still
bounds the *retained* per-connection buffer.

Verified: make check 190/190 (macOS); Docker Ubuntu test_epoll_gateway 5 tests / 226 assertions;
arg smokes confirm 9009 9010 / 9009x / 70000 rejected and --epoll starts cleanly.
@div0rce
Copy link
Copy Markdown
Owner Author

div0rce commented Jun 3, 2026

Addressed the round-4 review in 84cb486.

Fixed (transport layer):

  • Aborted accepts: accept4 transient errors (ECONNABORTED + the pending network errors Linux surfaces via accept) now continue instead of tearing down the whole event loop.
  • O(n²) flush: output is drained with a write offset (reclaiming the sent prefix once per append) instead of erase(begin) per send.
  • EINTR on send: send() retries on EINTR instead of dropping the connection (matching the accept/read paths).
  • Duplicate port (P3): 9009 9010 is now rejected instead of silently binding the last token.

Deliberately backlogged — issue #99: the transient allocation point. Session::on_bytes materializes a single request's whole response before the hard cap is checked, so a deep-book sweep still transiently allocates O(fan-out). Truly bounding that requires streaming/byte-budgeting response generation through the shared Session/gateway API (the blocking TCP transport uses it too) — a milestone-sized change beyond this epoll-transport prototype, not a transport fix. The hard cap still bounds the retained per-connection buffer. Documented in docs/socket_gateway.md.

Verified on Linux (Docker): test_epoll_gateway 5 tests / 226 assertions; arg smokes confirm the rejections; make check 190/190.

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 84cb486b6e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gateway/epoll_server.cpp Outdated
Comment thread PROGRESS.md Outdated
Comment thread src/gateway/epoll_server.cpp Outdated
Addresses the round-5 Codex review on PR #98.

- Drain a client's writable backlog (EPOLLOUT) before its read branch, so the hard-cap check sees
  an up-to-date pending() and a client that has resumed reading is not falsely dropped when
  pending + response would momentarily cross a cap that draining first would relieve.
- Close on EPOLLERR/EPOLLHUP regardless of queued output: a dead socket's pending bytes can never
  be sent, so the previous `!want_write` condition kept re-arming it and woke the event loop
  repeatedly with an unsendable buffer. EPOLLRDHUP stays flush-then-close, since a half-closed peer
  can still receive queued responses.
- PROGRESS.md: corrected the stale Docker epoll verification count (3 tests / 36 -> 5 / 226) and
  recorded the Codex hardening rounds + the issue #99 backlog.

Verified: make check 190/190 (macOS); Docker Ubuntu test_epoll_gateway 5 tests / 226 assertions.
@div0rce
Copy link
Copy Markdown
Owner Author

div0rce commented Jun 3, 2026

Round-5 review addressed in 9ff6c56.

  • Drain before read: the EPOLLOUT flush now runs before the read branch, so the hard-cap check sees current pending() — a client that resumed reading is no longer falsely dropped when pending + response momentarily crosses a cap that draining would relieve.
  • Close on hard error/hangup: EPOLLERR/EPOLLHUP now close regardless of queued output (a dead socket's bytes can't be sent; the old !want_write guard re-armed it and spun the loop). EPOLLRDHUP stays flush-then-close (a half-closed peer can still receive responses).
  • P3: corrected the stale PROGRESS epoll count (3/36 → 5/226).

Verified on Linux (Docker): test_epoll_gateway 5 tests / 226 assertions; make check 190/190.

The transient single-response allocation remains tracked as #99 (streaming response generation through the shared Session API — a separate milestone).

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9ff6c5614b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

errno == ENETUNREACH) {
continue;
}
return false; // genuinely fatal listener error
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Close active clients before fatal accept failure

If accept4 hits a fatal listener error after clients are already connected (for example EMFILE/ENFILE, which are not in the transient-error list), this return false exits serve_listen_socket without closing any descriptors stored in clients. Closing the epoll fd only unregisters interest; it does not close the client sockets, so a caller that logs the failure or restarts the server in-process leaks those connections and can worsen the fd-exhaustion condition that caused the failure.

Useful? React with 👍 / 👎.

Comment thread src/gateway/epoll_server.cpp Outdated
Comment thread src/gateway/epoll_server.cpp Outdated
@div0rce
Copy link
Copy Markdown
Owner Author

div0rce commented Jun 3, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 36e4182ea2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +201 to +204
const bool hard_socket_error = (ev & (EPOLLERR | EPOLLHUP)) != 0U;

if (hard_socket_error) {
close_now = true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Drain readable data before honoring hangups

When a peer writes a complete request and then closes the socket, Linux can report EPOLLIN together with EPOLLHUP; this branch treats EPOLLHUP as an immediate hard error, so the event loop closes the fd before the read path below can process the already-delivered bytes. That makes epoll mode drop orders/heartbeats in a disconnect-after-write case that the existing blocking TcpServer::serve_connection would process on its final positive read(). Please drain EPOLLIN data first and only close after the read buffer is exhausted/EOF is observed.

Useful? React with 👍 / 👎.

Comment on lines +232 to +234
if (status == SessionStatus::OutputLimitExceeded) {
client.close_after_flush = true;
break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Close over-cap clients instead of waiting for a flush

When OutputLimitExceeded fires after earlier responses are already buffered, setting close_after_flush means the later re-arm keeps the fd registered for EPOLLOUT until those pending bytes are sent. For a client that has stopped reading, EPOLLOUT may never arrive, so the connection is not actually dropped after tripping the hard cap and can keep pinning its per-client buffer; treat the hard-cap path as an immediate close/discard instead of waiting for a flush.

Useful? React with 👍 / 👎.

Comment thread src/gateway/session.cpp
Comment on lines +111 to 115
if (ensure_new_order_budget(gateway_, request.value, out, max_output_bytes) !=
SessionStatus::Ok) {
logged_out_ = true;
return SessionStatus::OutputLimitExceeded;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid previewing orders when no output cap is active

The legacy unbounded Session::on_bytes(input) now routes through this path with max_output_bytes == size_t::max, but every NewOrder still calls ensure_new_order_budget, which previews/fill-counts the same matching liquidity before gateway_.new_limit/new_market walks it again. On the default TCP/session path, large sweeps over many maker orders now do two full book traversals even though an unlimited caller cannot exceed an output cap; skip the preview when the cap is disabled/unlimited.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant