feat(support): per-key and per-session/IP rate limit on widget surface#42
Merged
Conversation
…(US-076) Wire the US-075 RateLimiter seam onto the public widget surface as its first call site, so the surface (which drives paid retrieval + LLM draft/judge calls) cannot be turned into a cost-amplification DoS. _enforce_widget_rate_limits charges two sliding windows via the seam on every request - a per-key bucket (caps aggregate abuse of one key across every session/IP) AND a per-session/IP bucket (caps one caller hammering across keys) - and refuses with a 429 + Retry-After if either is over, having done no costly work (invoked after the cheap shape guard but before the DB resolve / retrieval / LLM). The 429 throttle is distinct from US-077's 200 deferral; the tripped scope is logged for ops, never returned. Session identity prefers the left-most X-Forwarded-For hop (defense-in-depth, spoofable; edge/WAF is the documented production complement, P5). v1 accounting is coarse (requests-per-window, cost-weighted; precise token/$ metering is the F3 future refinement). The limiter is built once at startup via the US-075 factory only when support is configured (fails closed on a misconfigured backend); when unconfigured it is None and enforcement is a clean no-op (the endpoints already 503 before reaching anything costly). Wired onto POST /widget/keys/resolve now; US-078's message path reuses the helper with a higher cost. Test: backend/test_us076_widget_rate_limit.py - unit two-window decision logic plus a FastAPI-TestClient integration layer encoding the PRD validation test (per-session throttle while the per-key window has headroom; per-key throttle from a fresh session; throttle short-circuits the DB resolve) and a helper layer.
…esolve, fail-open on limiter error
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Intent
The developer wanted to implement user story US-075 from their phase-2 PRD: a swappable RateLimiter seam for the public support-widget surface, mirroring the repo's existing provider-factory patterns (reranking, web_search, parsing) with an env-selected backend and a durable Postgres default. They specified that this story ship only the seam plus its migration with no call site yet (later stories US-076/077 would wire it), that the Postgres backend use service-role SECURITY DEFINER RPCs over PostgREST with deny-all RLS (matching US-071's posture), and that there be no in-memory backend since durable cross-instance counting is required for an abuse/cost-DoS guard. After building it, they ran the change through the no-mistakes validation pipeline and, at the human-decision gates, chose to tighten window_seconds from float to int, fix the Postgres bucket identity to key on (key, window) plus add a Redis negative-cost guard, and approve/defer the missing ADR-0008 doc as out of scope. Finally they asked to merge PR #41 (squash) and delete the feature branch. Throughout they wanted live verification against a local Supabase and treated the pipeline's review findings as real gaps to fix before merge.
What Changed
RateLimiterseam onto the public widget surface as its first call site: every/widget/keys/resolverequest is drawn down against two sliding windows via_charge_widget_window/_enforce_widget_session_limit/_enforce_widget_key_limitinbackend/main.py. The per-session/IP window (ip:<session>, keyed off the left-mostX-Forwarded-Forhop) is charged first, before the DB resolve; the per-key window (key:<public_key>) is charged only after the resolve proves the key real, so a fake/rotating key mints no permanent counter row. A breach returns a hard 429 withRetry-Afterhaving done no retrieval/LLM work, while a limiter-backend error fails open (logs and allows) so the counter store is not a request-path SPOF. The limiter is built once at startup only when support is configured, and is a clean no-op otherwise.RATE_LIMITER,REDIS_URL,WIDGET_RATE_LIMIT_WINDOW_SECONDS,WIDGET_RATE_LIMIT_PER_KEY,WIDGET_RATE_LIMIT_PER_SESSION) and documented them inREADME.mdandbackend/.env.example, including the/widget/keys/resolveendpoint's new rate-limit behavior.backend/test_us076_widget_rate_limit.pycovering the two-window decision logic, the realPOST /widget/keys/resolvepath through a TestClient (session-first short-circuit, per-key enforcement from a fresh session, 429-with-Retry-After, fail-open on a raisinghit()), and the_widget_client_ip/ no-op-when-unconfigured helpers.Risk Assessment
Testing
Ran the bundled US-076 test (all unit/integration/helper layers pass, exit 0) and, because unit tests alone are not sufficient evidence, drove the real
POST /widget/keys/resolveendpoint end-to-end against a running local Supabase with the real durable Postgres rate-limiter and a genuine active widget key — capturing an HTTP transcript that shows the buyer widget being throttled (200×3 → 429 per-session with Retry-After → 429 per-key from a fresh session → 404 fake key) plus the persisted rate_limit_counters DB state proving durable cross-instance counting, independent windows, and the storage-amplification guard. Fail-open (429 vs 500) is covered by the test's raising-limiter assertion; this is a backend API change with no UI surface, so the reviewer-visible evidence is the CLI/HTTP transcript and persisted DB state rather than a screenshot. All checks pass; transient DB rows and temp logs were cleaned up and the worktree is clean.Evidence: Live end-to-end HTTP transcript (real app, real durable Postgres limiter, real widget key)
rate-limiter backend in use : 'postgres' limits : per_session=3 per_key=3 window=60s Step 1: 3x POST /widget/keys/resolve XFF=1.1.1.1 -> 200 {"active": true} Step 2: 4th from S XFF=1.1.1.1 -> 429 Retry-After=60 (per-SESSION window) Step 3: fresh session S2 XFF=2.2.2.2 -> 429 Retry-After=60 (per-KEY window enforces across sessions) Step 4: fake key XFF=8.8.8.8 -> 404 (unknown key, mints NO per-key counter row)Evidence: Persisted durable rate_limit_counters state + storage-amplification guard check
bucket_key | window_seconds | count ip:1.1.1.1 | 60 | 4 (3 ok + 1 throttled; blocked hit still counts) ip:2.2.2.2 | 60 | 1 ip:8.8.8.8 | 60 | 1 key:wk_pk_e2e_us076_demo_key_0001 | 60 | 4 (session-throttled 4th did NOT charge per-key) 0 per-key counter rows for the fake key (expected: 0) persisted per-key count read back via rate_limit_count RPC = 4 (durable, not in-memory)Evidence: End-to-end driver script used for live verification
Pipeline
Updates from git push no-mistakes
✅ **intent** - passed
✅ No issues found.
✅ **Rebase** - passed
✅ No issues found.
backend/main.py:3158- Per-session storage amplification reachable pre-auth. The per-session window is charged before the key resolves (_enforce_widget_session_limit(request)at line 3158), keyed on_widget_client_ip, which prefers the spoofable left-mostX-Forwarded-Forhop (line 2836). The only gate ahead of it is the cheapis_widget_public_keyshape check, so an unauthenticated caller sending a shape-valid-but-fakewk_pk_...key with a rotating X-Forwarded-For mints a permanentip:<spoofed>row per unique value. Therate_limit_hitRPC prune (migration 20260624170000) is scoped to one(bucket_key, window_seconds)and only reclaims older windows of that same key; distinct orphanedip:keys are never reclaimed and there is no global sweeper in-repo, sorate_limit_countersgrows unbounded. The per-key amplification was fixed by charging per-key only after resolve, but the per-session axis remains mintable with no valid key. This is documented as an accepted residual deferred to edge/WAF (P5) and a future US-075 sweeper; flagging for explicit acceptance since it ships open with no in-repo mitigation (a global TTL sweep on window_start < now() - 2*window would close it).✅ **Test** - passed
✅ No issues found.
python -m backend.test_us076_widget_rate_limit— all layers pass (exit 0): unit two-window decision logic, integration through realPOST /widget/keys/resolve(mocked resolve + fake limiter), helper layer (_widget_client_ip, no-op-when-unconfigured, fail-open)Live end-to-end drivere2e_us076_driver.pyagainst running local Supabase: real FastAPI app + real durable Postgres rate-limiter (RATE_LIMITER=postgres) + real active widget key in DB — captured the 200→200→200→429(session)→429(per-key)→404(fake key) HTTP transcript withRetry-After: 60Inspected persistedpublic.rate_limit_countersvia psql +rate_limit_countRPC: confirmed durable rows, independent ip:/key: buckets, blocked-hits-count, and zero counter rows for the fake key (storage-amplification guard)python -m backend.test_us075_rate_limiteragainst live Supabase — seam dependency intact, durable-across-restart integration assertions passVerified short-circuit in app logs: per-session 429 fires nowidget_keysresolve; fake key fires no per-keyrate_limit_hit✅ **Document** - passed
✅ No issues found.
✅ **Lint** - passed
✅ No issues found.
✅ **Push** - passed
✅ No issues found.