HTTP backends: reactor matches nginx (10x libuv), CI throughput tracker by junjihashimoto · Pull Request #3 · Verilean/lean-tea

junjihashimoto · 2026-07-02T00:32:06Z

Summary

New HTTP backends — LeanTea.Net.FastServer (POSIX FFI, SO_REUSEPORT, thread-per-conn) and LeanTea.Net.ReactorServer (kqueue/epoll non-blocking event loop). Unified via LeanTea.Net.Backend + LEANTEA_HTTP_BACKEND env-var picker.
Reactor matches / slightly beats nginx at 128 keep-alive conns on M-series: 72 149 RPS vs nginx 69 428. libuv baseline was 6 218 RPS; that's ~10× on RPS and ~5-10× on p99 tail latency.
CI throughput tracker — .github/workflows/bench.yml runs on every push to main, boots lean-tea + nginx side-by-side on the same runner, records both absolute RPS and the parity ratio through benchmark-action/github-action-benchmark, persists to bench-data/http-bench.json.
Typed LeanTea.Web.Route — Yesod-style inductive routes so Route.link refuses raw String hrefs; dead links become compile errors.

The measured story

server	health RPS	vs nginx	p99 (ms)
libuv `Server`	6 218	9 %	17
FFI `FastServer` (SO_REUSEPORT)	64 297	90 %	2.1
Reactor (kqueue/epoll)	72 149	104 %	2.1
nginx (same box, same conf)	69 428	—	2.0

Full breakdown (three routes × three concurrency levels + saturation runs) in docs/BENCHMARKS.md.

Why the libuv version was slow

Every recv/send hopped through: Lean IO → AsyncTask alloc → .block submit → libuv epoll → completion callback → Lean scheduler wake → fiber resume. Profiled at 100–500 µs per hop; each request paid 3–5 hops. Matches the measured T=1 ceiling of 1 933 RPS ≈ 500 µs/req.

The FFI variants skip that entire path — recv() and send() are direct syscalls. The reactor additionally keeps every idle connection at ~100 bytes of C state (no OS thread per fd) so it scales to 10 k+ idle connections without the LEAN_NUM_THREADS >= concurrency sharp edge the FastServer has.

Which backend to use

LeanTea.Net.Backend.fromEnv picks one from LEANTEA_HTTP_BACKEND:

var	pick	best for
unset / `reactor`	reactor	default — HTTP APIs
`fast` / `fast:16`	`FastServer` N workers	short-req APIs, no idle floods
`libuv`	`Server.serveConcurrent`	LLM proxy, WS, SSE, chat

Apps typically write:

def main : IO Unit := do
  let backend ← LeanTea.Net.Backend.fromEnv (default := .reactor)
  LeanTea.Net.Backend.serve backend 8080 "0.0.0.0" myHandler

Also in this PR

Fixed a Response.toBytes bug — used to append a hardcoded connection: close even when the annotator added connection: keep-alive, so responses carried both. ab was lenient; strict clients would have dropped. The serializer is now a single growing string + one toUTF8.
bench_server picks up the backend from Backend.fromEnv; the --fast N / --reactor CLI flags stay for explicit perf runs and win over the env var.
LeanTea.Web.Route (typed inductive route + Route.link compile-time dead-link check). Standalone commit at the bottom of the branch — not on the critical perf path but was blocked in the same session.

Test plan

…r helpers Grep of the tree turned up ~15 sites where handlers hand-wrote JSON bodies as string literals ("{\"error\":\"…\"}") or, worse, string concatenation ("{\"sub\":\"" ++ u.sub ++ "\","...). Two failure modes that the compiler couldn't catch: 1. Brace / quote balance is a lint at best. A stray missing } would have shipped as invalid JSON to any caller. 2. Concat sites in LeanTea/Auth/Idp.lean (L177 access-token response, L198 /userinfo response) inlined attacker-controlled values (u.email, u.name) with no escaping. As long as those fields never contained a `"` the shape held — but that's the definition of a latent injection. Three new helpers on Response so handlers can hand the codec the problem: * Response.json status body -- body : Lean.Json * Response.jsonObj status v -- v : α with [ToJson α] * Response.jsonError status msg -- convenience for {"error": msg} Sites migrated: LeanTea/Auth/Idp.lean — 6 error responses + the 2 concat sites (Bearer token + /userinfo). LeanTea/Browser.lean — CDP close message. examples/AgentDashboard/Serve.lean — 5 sites. examples/LlmChatWeb/Serve.lean — 1 site. examples/Docs/Ch04_TypedRpc.lean — 1 site (matters for teaching). examples/Smoke/HttpClient.lean — the JSON-RPC handshake body. Untouched (intentionally): * The doc-comment JSON in LeanTea/Net/WebSocket.lean:28 — an illustration, not runtime code. * The startsWith needle in Smoke/HttpClient.lean — matching a prefix, not constructing. * The forged JWT in Tests/PureSpec.lean — the whole point of the test is a malformed token; must stay hand-authored. Verified: lake build → 162/162 green.

Adds: * examples/BenchServer/Main.lean — 3 tiny routes (health / json / echo) exposed via serveConcurrent * lean_exe bench_server target * bench/run.sh — Apache Bench-based harness that varies LEAN_NUM_THREADS across {1,2,4,8,16} and dumps a compact RPS / p50 / p99 / avg table * bench/results-{health,json,echo}.txt — captured runs * docs/BENCHMARKS.md — writeup + interpretation Headline finding: the current serveConcurrent does NOT scale past one worker thread on this hardware. Peak throughput is at LEAN_NUM_THREADS=1 (~6-7k RPS on all three routes) and adding workers slightly regresses (task-spawn + scheduler coordination cost exceeds the parallelism benefit for handlers this short). We are 1-2 orders of magnitude below nginx / warp on the same box. Why: the accept loop is a single OS thread that hands each accepted connection to IO.asTask. Every connection serialises on one accept(); tiny handlers make the task-spawn overhead visible. Next-round design notes captured in docs/BENCHMARKS.md (SO_REUSEPORT + per-worker accept loops, an in-place synchronous handler variant, HTTP/1.1 keep-alive). Ships this doc BEFORE opening the branch to HN, so the front page can drop the "on par with nginx / wai" language it currently implies. That claim was ambition, not measurement.

The bench in the previous commit showed the server didn't scale with LEAN_NUM_THREADS — adding workers slightly *lowered* RPS because task-spawn overhead exceeded useful work on tiny handlers. Root cause: every request opened + closed a fresh TCP connection, so we paid for accept + shutdown syscalls on every request. This commit teaches serveConcurrent to keep the connection open: * New `Request.version` field (HTTP/1.0 vs 1.1) set by parseRequest, so the keep-alive logic can pick the right default. * Server side loops on the same client until either the request carries `Connection: close`, HTTP/1.0 default, or the socket dies. Response gets a `Connection: keep-alive|close` header auto-annotated if the handler didn't set one. * Nagle off on the server socket (`Socket.Server.noDelay`) so tiny responses hit the wire immediately. * recvUntilRequest carries leftover bytes forward — pipelining tolerance + one syscall saved when the next request's headers arrived in the same TCP segment as the previous body. * Backlog bumped from 64 → 128. Effect (ab -k -c 64 -n 50000, same host as before): T RPS-before RPS-after (health) ----- --------- ---------- 1 6657 1933 2 5950 2485 4 5663 3420 8 5717 4469 16 5656 6218 ← now the peak, up from ~5700 Absolute peak throughput is roughly unchanged (~6-7k RPS), but the regime changed: * Before: without client-side keep-alive, T=1 was pathological peak because task-spawn was the bottleneck. * After: keep-alive amortises TCP setup so per-request cost falls; the bottleneck moves to the single-thread accept loop, and RPS scales with LEAN_NUM_THREADS up to that ceiling. Neither number is close to nginx-class throughput (100k+ RPS) and that stays true until we can bind N listener sockets to the same port with SO_REUSEPORT — which needs a socket-option API in Std.Net that Lean 4.31 doesn't expose. docs/BENCHMARKS.md now has both rounds side by side and calls out the remaining work. Build stays green (162/162).

…nk check Yesod-style routing: apps declare an inductive Route type and derive Route.toPath. Route.link takes a constructor + label and refuses raw String hrefs, so renaming or removing a route constructor turns every call site into a compile error rather than a broken href at deploy time. Follow-ups still on the roadmap: * fromPath : String -> Option Route (bidirectional dispatch codec) * Typed captures / query-string parameters at the type level (for those cases the RPC layer already covers, use LeanTea.Rpc).

Three HTTP backends now ship, all sharing the same Handler = Request -> IO Response signature. LeanTea.Net.Backend exposes them through one enum + Backend.fromEnv so an app's main picks via LEANTEA_HTTP_BACKEND without touching handler code. * LeanTea.Net.Server (libuv, existing) — best for LLM proxy / WS / SSE / any workload with many idle connections that yield on .block. * LeanTea.Net.FastServer (c/leantea_fastnet.c) — POSIX socket() + SO_REUSEPORT + blocking recv/send behind @[extern]. N accept workers each with their own listener; kernel round-robins accepts. Skips the ~100-500 us libuv/task-scheduler hop that was capping the framework at 6 k RPS. * LeanTea.Net.ReactorServer (c/leantea_reactor.c) — kqueue on macOS/BSD, epoll on Linux. Single non-blocking event loop manages every fd. Per-conn state (recv accumulator + partial send remnant) lives in ~100 bytes of C, so idle keep-alive connections don't cost an OS thread. Default. Measured on an M-series laptop (wrk t=8 c=128 15s): libuv Server 6 218 RPS (9 % of nginx) FFI FastServer 64 297 RPS (90 % of nginx) Reactor 72 149 RPS (104 % of nginx) nginx (reference) 69 428 RPS Full numbers with p50/p99 and c=2000 saturation runs live in docs/BENCHMARKS.md. Also included in this commit: * Response.toBytes bug: it used to append a hardcoded "connection: close" header at the terminator, so every keep- alive response actually carried both keep-alive and close. ab tolerated it; strict clients would have dropped. Fixed + replaced the s! ping-pong with a single growing string + one toUTF8. Applies to all three backends. * bench_server picks up the backend from Backend.fromEnv; --fast and --reactor CLI flags stay for explicit perf runs and win over the env var. * README claims parity with nginx (measured, not aspirational).

.github/workflows/bench.yml runs on every push to main: 1. Boots bench_server (LEANTEA_HTTP_BACKEND=reactor) and a matching nginx side-by-side on the same ubuntu-latest runner. 2. Hits both with wrk -t8 -c128 -d15s on /health, /json, /echo. 3. Assembles a customBiggerIsBetter JSON payload that includes both absolute RPS AND the lean-tea/nginx % ratio per route. 4. Feeds it to benchmark-action/github-action-benchmark, which appends to bench-data/http-bench.json and flags anything below 80 % of the previous best. 5. Commits the updated JSON back to main via the action bot. The absolute RPS on a 4-vCPU runner will always trail M-series numbers; the parity ratio is what to trend, since both servers run on the same runner in the same job. paths-ignore: bench-data/** on the trigger — without it the bot's own commit would kick off another bench run. fail-on-alert: false for now; flip once ~20 runs establish the noise floor.

junjihashimoto added 6 commits July 1, 2026 22:57

junjihashimoto merged commit 039935e into main Jul 2, 2026
1 check passed

junjihashimoto deleted the feat/http-backends branch July 2, 2026 00:34

junjihashimoto mentioned this pull request Jul 2, 2026

Bench: Std.Http.Server reference — reactor is 44x faster #4

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HTTP backends: reactor matches nginx (10x libuv), CI throughput tracker#3

HTTP backends: reactor matches nginx (10x libuv), CI throughput tracker#3
junjihashimoto merged 6 commits into
mainfrom
feat/http-backends

junjihashimoto commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

junjihashimoto commented Jul 2, 2026

Summary

The measured story

Why the libuv version was slow

Which backend to use

Also in this PR

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant