feat(support): per-key and per-session/IP rate limit on widget surface by hcho22 · Pull Request #42 · hcho22/Agentic_RAG

hcho22 · 2026-06-26T18:49:18Z

Intent

The developer wanted to implement user story US-075 from their phase-2 PRD: a swappable RateLimiter seam for the public support-widget surface, mirroring the repo's existing provider-factory patterns (reranking, web_search, parsing) with an env-selected backend and a durable Postgres default. They specified that this story ship only the seam plus its migration with no call site yet (later stories US-076/077 would wire it), that the Postgres backend use service-role SECURITY DEFINER RPCs over PostgREST with deny-all RLS (matching US-071's posture), and that there be no in-memory backend since durable cross-instance counting is required for an abuse/cost-DoS guard. After building it, they ran the change through the no-mistakes validation pipeline and, at the human-decision gates, chose to tighten window_seconds from float to int, fix the Postgres bucket identity to key on (key, window) plus add a Redis negative-cost guard, and approve/defer the missing ADR-0008 doc as out of scope. Finally they asked to merge PR #41 (squash) and delete the feature branch. Throughout they wanted live verification against a local Supabase and treated the pipeline's review findings as real gaps to fix before merge.

What Changed

Wired the US-075 RateLimiter seam onto the public widget surface as its first call site: every /widget/keys/resolve request is drawn down against two sliding windows via _charge_widget_window / _enforce_widget_session_limit / _enforce_widget_key_limit in backend/main.py. The per-session/IP window (ip:<session>, keyed off the left-most X-Forwarded-For hop) is charged first, before the DB resolve; the per-key window (key:<public_key>) is charged only after the resolve proves the key real, so a fake/rotating key mints no permanent counter row. A breach returns a hard 429 with Retry-After having done no retrieval/LLM work, while a limiter-backend error fails open (logs and allows) so the counter store is not a request-path SPOF. The limiter is built once at startup only when support is configured, and is a clean no-op otherwise.
Added new config knobs (RATE_LIMITER, REDIS_URL, WIDGET_RATE_LIMIT_WINDOW_SECONDS, WIDGET_RATE_LIMIT_PER_KEY, WIDGET_RATE_LIMIT_PER_SESSION) and documented them in README.md and backend/.env.example, including the /widget/keys/resolve endpoint's new rate-limit behavior.
Added backend/test_us076_widget_rate_limit.py covering the two-window decision logic, the real POST /widget/keys/resolve path through a TestClient (session-first short-circuit, per-key enforcement from a fresh session, 429-with-Retry-After, fail-open on a raising hit()), and the _widget_client_ip / no-op-when-unconfigured helpers.

Risk Assessment

⚠️ Medium: The change is well-bounded, cleanly tested, and wires an existing seam onto one endpoint, but it introduces a pre-auth unbounded-table-growth vector on the per-session axis that currently has no in-repo mitigation (sweeper deferred, edge/WAF is deployment-dependent), so it is safe to merge with that concern tracked as a follow-up.

Testing

Ran the bundled US-076 test (all unit/integration/helper layers pass, exit 0) and, because unit tests alone are not sufficient evidence, drove the real POST /widget/keys/resolve endpoint end-to-end against a running local Supabase with the real durable Postgres rate-limiter and a genuine active widget key — capturing an HTTP transcript that shows the buyer widget being throttled (200×3 → 429 per-session with Retry-After → 429 per-key from a fresh session → 404 fake key) plus the persisted rate_limit_counters DB state proving durable cross-instance counting, independent windows, and the storage-amplification guard. Fail-open (429 vs 500) is covered by the test's raising-limiter assertion; this is a backend API change with no UI surface, so the reviewer-visible evidence is the CLI/HTTP transcript and persisted DB state rather than a screenshot. All checks pass; transient DB rows and temp logs were cleaned up and the worktree is clean.

Evidence: Live end-to-end HTTP transcript (real app, real durable Postgres limiter, real widget key)

rate-limiter backend in use : 'postgres' limits : per_session=3 per_key=3 window=60s Step 1: 3x POST /widget/keys/resolve XFF=1.1.1.1 -> 200 {"active": true} Step 2: 4th from S XFF=1.1.1.1 -> 429 Retry-After=60 (per-SESSION window) Step 3: fresh session S2 XFF=2.2.2.2 -> 429 Retry-After=60 (per-KEY window enforces across sessions) Step 4: fake key XFF=8.8.8.8 -> 404 (unknown key, mints NO per-key counter row)

==============================================================================
US-076 END-TO-END  —  real FastAPI app, real durable Postgres limiter
==============================================================================
rate-limiter backend in use : 'postgres'  (None would mean unconfigured)
limits                      : per_session=3  per_key=3  window=60s
active widget key (real DB) : wk_pk_e2e_us076_demo_key_0001

--- Step 1: 3 requests from session S (X-Forwarded-For 1.1.1.1) --------------
  POST /widget/keys/resolve  XFF=1.1.1.1  -> 200  body={"active": true}
  POST /widget/keys/resolve  XFF=1.1.1.1  -> 200  body={"active": true}
  POST /widget/keys/resolve  XFF=1.1.1.1  -> 200  body={"active": true}
  => buyer widget resolves the key: active=true, within the per-session limit

--- Step 2: a 4th request from the SAME session S ---------------------------
  POST /widget/keys/resolve  XFF=1.1.1.1  -> 429  Retry-After=60  body={"detail": "rate limit exceeded \u2014 too many requests, please retry shortly"}
  => per-SESSION window throttles the hammering caller with HTTP 429 + Retry-After

--- Step 3: a FRESH session S2 (X-Forwarded-For 2.2.2.2), same key ----------
  POST /widget/keys/resolve  XFF=2.2.2.2  -> 429  Retry-After=60  body={"detail": "rate limit exceeded \u2014 too many requests, please retry shortly"}
  => per-KEY window throttles even a brand-new session: aggregate key abuse is capped

--- Step 4: storage-amplification guard — a fake/unknown key ----------------
  POST /widget/keys/resolve  XFF=8.8.8.8  key=wk_pk_this_key_does_not_exist_999  -> 404  body={"detail": "unknown or inactive widget key"}
  => unknown key 404s after the cheap session charge; it must NOT mint a per-key row

(app shut down cleanly — startup built and shutdown released the real limiter)

Evidence: Persisted durable rate_limit_counters state + storage-amplification guard check

bucket_key | window_seconds | count ip:1.1.1.1 | 60 | 4 (3 ok + 1 throttled; blocked hit still counts) ip:2.2.2.2 | 60 | 1 ip:8.8.8.8 | 60 | 1 key:wk_pk_e2e_us076_demo_key_0001 | 60 | 4 (session-throttled 4th did NOT charge per-key) 0 per-key counter rows for the fake key (expected: 0) persisted per-key count read back via rate_limit_count RPC = 4 (durable, not in-memory)

            bucket_key             | window_seconds | count |      window_start      
-----------------------------------+----------------+-------+------------------------
 ip:1.1.1.1                        |             60 |     4 | 2026-06-26 18:39:00+00
 ip:2.2.2.2                        |             60 |     1 | 2026-06-26 18:39:00+00
 ip:8.8.8.8                        |             60 |     1 | 2026-06-26 18:39:00+00
 key:wk_pk_e2e_us076_demo_key_0001 |             60 |     4 | 2026-06-26 18:39:00+00
(4 rows)

0 per-key counter rows for the fake key  (expected: 0)
persisted per-key count for the real key = 4

Evidence: End-to-end driver script used for live verification

"""US-076 end-to-end driver: REAL app, REAL durable Postgres rate-limiter, REAL key.

No mocks. Drives `POST /widget/keys/resolve` exactly as the buyer's widget would,
against a live local Supabase. Demonstrates, as an end user experiences it:

  * the per-SESSION/IP sliding window throttling one caller (429 + Retry-After),
  * the per-KEY sliding window throttling aggregate abuse even from a fresh session,
  * the storage-amplification guard (a fake key mints NO per-key counter row),
  * that the counts are DURABLE rows in public.rate_limit_counters (not in-memory).
"""
from __future__ import annotations

import json
import os
import sys

# --- env MUST be set before importing main (module-level + startup reads) ------
ENV = {
    "SUPABASE_URL": "http://127.0.0.1:54321",
    "SUPABASE_ANON_KEY": os.environ["SB_ANON"],
    "SUPABASE_SERVICE_ROLE_KEY": os.environ["SB_SERVICE"],
    "OPENAI_API_KEY": "sk-test-dummy",
    "RATE_LIMITER": "postgres",                      # the durable default backend
    "WIDGET_RATE_LIMIT_PER_KEY": "3",
    "WIDGET_RATE_LIMIT_PER_SESSION": "3",
    "WIDGET_RATE_LIMIT_WINDOW_SECONDS": "60",
}
os.environ.update(ENV)
sys.path.insert(0, "backend")

import main  # noqa: E402
from fastapi.testclient import TestClient  # noqa: E402

PK = "wk_pk_e2e_us076_demo_key_0001"
FAKE = "wk_pk_this_key_does_not_exist_999"
ORIGIN = "https://client.example"

transcript: list[str] = []


def log(line: str = "") -> None:
    print(line)
    transcript.append(line)


def req(client: TestClient, ip: str, key: str = PK) -> tuple[int, dict, str | None]:
    r = client.post(
        "/widget/keys/resolve",
        json={"public_key": key},
        headers={"Origin": ORIGIN, "X-Forwarded-For": ip},
    )
    return r.status_code, (r.json() if r.content else {}), r.headers.get("Retry-After")


# TestClient as a context manager fires the FastAPI startup event -> builds the
# REAL rate limiter via the US-075 factory against the live Supabase.
with TestClient(main.app) as client:
    log("=" * 78)
    log("US-076 END-TO-END  —  real FastAPI app, real durable Postgres limiter")
    log("=" * 78)
    log(f"rate-limiter backend in use : {main._RATE_LIMITER.name!r}  "
        f"(None would mean unconfigured)")
    log(f"limits                      : per_session="
        f"{main.WIDGET_RATE_LIMIT_PER_SESSION}  per_key="
        f"{main.WIDGET_RATE_LIMIT_PER_KEY}  window="
        f"{main.WIDGET_RATE_LIMIT_WINDOW_SECONDS}s")
    log(f"active widget key (real DB) : {PK}")
    log("")

    log("--- Step 1: 3 requests from session S (X-Forwarded-For 1.1.1.1) --------------")
    for i in range(3):
        code, body, ra = req(client, "1.1.1.1")
        log(f"  POST /widget/keys/resolve  XFF=1.1.1.1  -> {code}  "
            f"body={json.dumps(body)}")
    log("  => buyer widget resolves the key: active=true, within the per-session limit")
    log("")

    log("--- Step 2: a 4th request from the SAME session S ---------------------------")
    code, body, ra = req(client, "1.1.1.1")
    log(f"  POST /widget/keys/resolve  XFF=1.1.1.1  -> {code}  "
        f"Retry-After={ra}  body={json.dumps(body)}")
    log("  => per-SESSION window throttles the hammering caller with HTTP 429 + Retry-After")
    log("")

    log("--- Step 3: a FRESH session S2 (X-Forwarded-For 2.2.2.2), same key ----------")
    code, body, ra = req(client, "2.2.2.2")
    log(f"  POST /widget/keys/resolve  XFF=2.2.2.2  -> {code}  "
        f"Retry-After={ra}  body={json.dumps(body)}")
    log("  => per-KEY window throttles even a brand-new session: aggregate key abuse is capped")
    log("")

    log("--- Step 4: storage-amplification guard — a fake/unknown key ----------------")
    code, body, ra = req(client, "8.8.8.8", key=FAKE)
    log(f"  POST /widget/keys/resolve  XFF=8.8.8.8  key={FAKE}  -> {code}  "
        f"body={json.dumps(body)}")
    log("  => unknown key 404s after the cheap session charge; it must NOT mint a per-key row")

log("")
log("(app shut down cleanly — startup built and shutdown released the real limiter)")

with open(os.environ["TRANSCRIPT_OUT"], "w") as f:
    f.write("\n".join(transcript) + "\n")

Pipeline

Updates from git push no-mistakes

✅ **intent** - passed

✅ No issues found.

✅ **Rebase** - passed

✅ No issues found.

⚠️ **Review** - 1 warning

⚠️ backend/main.py:3158 - Per-session storage amplification reachable pre-auth. The per-session window is charged before the key resolves (_enforce_widget_session_limit(request) at line 3158), keyed on _widget_client_ip, which prefers the spoofable left-most X-Forwarded-For hop (line 2836). The only gate ahead of it is the cheap is_widget_public_key shape check, so an unauthenticated caller sending a shape-valid-but-fake wk_pk_... key with a rotating X-Forwarded-For mints a permanent ip:<spoofed> row per unique value. The rate_limit_hit RPC prune (migration 20260624170000) is scoped to one (bucket_key, window_seconds) and only reclaims older windows of that same key; distinct orphaned ip: keys are never reclaimed and there is no global sweeper in-repo, so rate_limit_counters grows unbounded. The per-key amplification was fixed by charging per-key only after resolve, but the per-session axis remains mintable with no valid key. This is documented as an accepted residual deferred to edge/WAF (P5) and a future US-075 sweeper; flagging for explicit acceptance since it ships open with no in-repo mitigation (a global TTL sweep on window_start < now() - 2*window would close it).

✅ **Test** - passed

✅ No issues found.

python -m backend.test_us076_widget_rate_limit — all layers pass (exit 0): unit two-window decision logic, integration through real POST /widget/keys/resolve (mocked resolve + fake limiter), helper layer (_widget_client_ip, no-op-when-unconfigured, fail-open)
Live end-to-end driver e2e_us076_driver.py against running local Supabase: real FastAPI app + real durable Postgres rate-limiter (RATE_LIMITER=postgres) + real active widget key in DB — captured the 200→200→200→429(session)→429(per-key)→404(fake key) HTTP transcript with Retry-After: 60
Inspected persisted public.rate_limit_counters via psql + rate_limit_count RPC: confirmed durable rows, independent ip:/key: buckets, blocked-hits-count, and zero counter rows for the fake key (storage-amplification guard)
python -m backend.test_us075_rate_limiter against live Supabase — seam dependency intact, durable-across-restart integration assertions pass
Verified short-circuit in app logs: per-session 429 fires no widget_keys resolve; fake key fires no per-key rate_limit_hit

✅ **Document** - passed

✅ No issues found.

✅ **Lint** - passed

✅ No issues found.

✅ **Push** - passed

✅ No issues found.

…(US-076) Wire the US-075 RateLimiter seam onto the public widget surface as its first call site, so the surface (which drives paid retrieval + LLM draft/judge calls) cannot be turned into a cost-amplification DoS. _enforce_widget_rate_limits charges two sliding windows via the seam on every request - a per-key bucket (caps aggregate abuse of one key across every session/IP) AND a per-session/IP bucket (caps one caller hammering across keys) - and refuses with a 429 + Retry-After if either is over, having done no costly work (invoked after the cheap shape guard but before the DB resolve / retrieval / LLM). The 429 throttle is distinct from US-077's 200 deferral; the tripped scope is logged for ops, never returned. Session identity prefers the left-most X-Forwarded-For hop (defense-in-depth, spoofable; edge/WAF is the documented production complement, P5). v1 accounting is coarse (requests-per-window, cost-weighted; precise token/$ metering is the F3 future refinement). The limiter is built once at startup via the US-075 factory only when support is configured (fails closed on a misconfigured backend); when unconfigured it is None and enforcement is a clean no-op (the endpoints already 503 before reaching anything costly). Wired onto POST /widget/keys/resolve now; US-078's message path reuses the helper with a higher cost. Test: backend/test_us076_widget_rate_limit.py - unit two-window decision logic plus a FastAPI-TestClient integration layer encoding the PRD validation test (per-session throttle while the per-key window has headroom; per-key throttle from a fresh session; throttle short-circuits the DB resolve) and a helper layer.

…esolve, fail-open on limiter error

…and .env.example

vercel · 2026-06-26T18:49:25Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agentic-rag	Ready	Preview, Comment	Jun 26, 2026 6:49pm

hcho22 added 3 commits June 26, 2026 09:44

no-mistakes(review): split widget rate-limit helpers: per-key after r…

d15fd7f

…esolve, fail-open on limiter error

no-mistakes(document): document widget rate-limit env vars in README …

a5bd011

…and .env.example

hcho22 merged commit a1564ee into main Jun 26, 2026
2 checks passed

hcho22 deleted the feat/us076-widget-rate-limit branch June 26, 2026 20:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(support): per-key and per-session/IP rate limit on widget surface#42

feat(support): per-key and per-session/IP rate limit on widget surface#42
hcho22 merged 3 commits into
mainfrom
feat/us076-widget-rate-limit

hcho22 commented Jun 26, 2026

Uh oh!

vercel Bot commented Jun 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hcho22 commented Jun 26, 2026

Intent

What Changed

Risk Assessment

Testing

Pipeline

Uh oh!

vercel Bot commented Jun 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant