feat(support): swappable RateLimiter seam with durable Postgres default (US-075)#41
Merged
Conversation
…US-075) Add backend/rate_limiting.py: a RateLimiter ABC + env-selected factory mirroring the reranking/web_search/parsing seams. RATE_LIMITER=postgres (default) | redis; no in-memory backend (per-replica under-count + restart reset would silently disable the cost-DoS guard, ADR-0008). PostgresRateLimiter reaches counters via two service-role SECURITY DEFINER RPCs over PostgREST (same posture as US-071's resume_conversation); RedisRateLimiter is an optional scale adapter (lazy import, redis not in requirements). Both use a bounded two-bucket sliding-window counter. Migration 20260624170000_rate_limit_counters.sql adds the deny-all RLS counter table + rate_limit_hit/rate_limit_count RPCs, granted to service_role only. Ships only the seam + migration (no call site yet by design); US-076 wires it onto the widget endpoints, US-077 the per-workspace circuit breaker. Test: python -m backend.test_us075_rate_limiter (unit always-on + integration skip-clean). Verified against local Supabase: migration applies, all assertions pass, deny-all/service-role-only boundary holds at the live PostgREST layer.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
Retrieval eval — PR vs
|
| Mode | recall@5 | MRR | nDCG@5 |
|---|---|---|---|
| vector | 0.860 (±0.000) | 0.772 (±0.000) | 0.779 (±0.000) |
| keyword | 0.110 (±0.000) | 0.120 (±0.000) | 0.112 (±0.000) |
| hybrid | 0.860 (±0.000) | 0.759 (±0.000) | 0.769 (±0.000) |
Per-category recall@5
| Mode | single_chunk | multi_hop | adversarial | paraphrase |
|---|---|---|---|---|
| vector | 0.900 (±0.000) | 0.933 (±0.000) | 0.600 (±0.000) | 1.000 (±0.000) |
| keyword | 0.250 (±0.000) | 0.033 (±0.000) | 0.000 (±0.000) | 0.000 (±0.000) |
| hybrid | 0.900 (±0.000) | 0.933 (±0.000) | 0.600 (±0.000) | 1.000 (±0.000) |
Comment is updated in place on each push by .github/workflows/retrieval-eval.yml (US-035). Comment-only — never blocks the build.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Intent
The developer asked the agent to implement user story US-075 from their phase-2 implementation PRD (.claude/agent/tasks/prd-phase2-implementation.md), which calls for a swappable RateLimiter seam for the public support widget's abuse controls. The story required mirroring the repo's existing provider-factory patterns (reranking, web_search, parsing) with a RateLimiter ABC plus an env-selected factory, shipping only the seam and its migration with no call site yet (later stories wire it onto endpoints), using a durable Postgres default backend and deliberately providing no in-memory option. After implementing it, the developer invoked the /no-mistakes gate on US-075 to validate the change through automated review, tests, lint, docs, push, PR, and CI before it reaches the configured push target.
What Changed
backend/rate_limiting.py: aRateLimiterABC (hit/count/aclose) returning aRateLimitDecision, plus an env-selectedbuild_rate_limiter()factory (RATE_LIMITER=postgres|redis, defaultpostgres) that mirrors the repo's existing provider factories and fails closed at build time on missing config. There is deliberately no in-memory backend (it under-counts per replica and resets on restart), and this ships the seam only - no call sites yet, which US-076/077 will wire onto the widget endpoints.20260624170000_rate_limit_counters.sql: arate_limit_counterstable (RLS enabled, zero policies - anon/authenticated denied wholesale) keyed by(bucket_key, window_seconds, window_start), plus service-role-only SECURITY DEFINER RPCsrate_limit_hit/rate_limit_countimplementing a two-bucket sliding-window counter with an in-RPC prune bounded to ≤2 live rows per (key, window).window_secondsfrom float to int, scoped both Postgres and Redis counters by (key, window) so multi-window callers don't conflate or prune each other, guarded the Redis backend against negative cost for parity with the RPC, and addedbackend/test_us075_rate_limiter.pyplus AGENTS.md / PRD doc sync.Risk Assessment
✅ Low: A well-bounded, no-call-site seam whose two substantive issues were already resolved in prior review rounds; only a minor stale-doc phrasing inconsistency remains.
Testing
Ran the full US-075 test module (unit + integration) - both layers pass, with the integration layer executing the PRD Validation Test live against the running local Supabase (durable counter survives a simulated restart at value 5; limit decision flips allowed True->False). I then drove the rate_limit_hit/rate_limit_count RPCs directly over the live PostgREST (the exact path PostgresRateLimiter uses) to capture reviewer-visible product evidence: durable sliding-window counting, peek-does-not-increment, the limit flip with blocked hits still counting, the deny-all/service-role-only boundary (anon refused with 42501 on both RPCs and 0 rows on a direct table read, privilege table FALSE for anon/authenticated and TRUE for service_role), the bounded persisted rate_limit_counters rows, and the per-(key,window) scoping fix from the review commits. No UI surface exists for this change (it's a backend seam with no call site yet, by design), so evidence is a CLI transcript + persisted DB state rather than a screenshot. Working tree left clean; evidence written to the dedicated evidence directory.Evidence: Live PostgREST rate-limiter transcript (durability, limit flip, deny-all boundary, per-(key,window) scoping, persisted rows)
[1] Durable counting: turns 1..5 -> current_count 1..5, allowed:true [2] Peek twice -> 5, 5 (no increment) [3] turn 6 -> allowed:false count 6; turn 7 -> allowed:false count 7 (still counting while blocked) [4] anon rate_limit_hit -> HTTP 401 {"code":"42501",...permission denied for function rate_limit_hit}; anon rate_limit_count -> HTTP 401 42501; anon GET /rate_limit_counters -> HTTP 200 [] (deny-all RLS) [5] same key window=60s cost=2 -> count 2 (independent); window=3600s still reads 7 (untouched) [6] persisted rows: (key,60,count=2) and (key,3600,count=7) -> 2 rows, bounded [7] has_function_privilege: rate_limit_hit/rate_limit_count anon=f authenticated=f service_role=tEvidence: Evidence driver script (live RPCs against local Supabase PostgREST)
Pipeline
Updates from git push no-mistakes
✅ **intent** - passed
✅ No issues found.
✅ **Rebase** - passed
✅ No issues found.
backend/rate_limiting.py:162- The seam typeswindow_seconds: float(RateLimitDecision and every ABC/impl method), but neither backend actually honors non-integer windows. PostgresRateLimiter sendsint(window_seconds)(rate_limiting.py:162,182), so a sub-second window (e.g. 0.5) truncates to 0 and the RPC raisesp_window_seconds must be > 0, and a fractional window like 1.5 silently becomes 1. RedisRateLimiter uses the float for idx/weight math butint(window_seconds)for the key namespace (rate_limiting.py:229), so two fractional windows that floor to the same integer (1.4 and 1.6) would share buckets while computing different weights. The module docstring promises behavior is 'byte-identical no matter which backend is configured', which holds only for integer windows. No current impact (no call sites; US-076/077 use integer-second windows), but the float type advertises a capability neither backend supports - worth constraining the contract to int or documenting integer-only before US-076 wires it.supabase/migrations/20260624170000_rate_limit_counters.sql:116- Under the default Postgres backend, everyhiton a key performs INSERT...ON CONFLICT DO UPDATE + SELECT(prev) + DELETE(prune) against the same bucket row, so a focused flood on one bucket_key serializes on that row's lock and generates a sustained write+dead-tuple/autovacuum stream on the very abuse path the limiter exists to absorb. This is the documented Postgres-default tradeoff (Redis is the named scale path), and it fails safe (the limiter throttles rather than letting requests through), but flag it for US-076 capacity planning: a single hot key is a DB-throughput chokepoint, not a free counter.🔧 Fix: tighten rate limiter window_seconds contract from float to int
2 issues (1 warning, 1 info) still open:
supabase/migrations/20260624170000_rate_limit_counters.sql:107- The Postgres counter row is identified by (bucket_key, window_start) with bucket_key = p_key only - the window size is NOT part of the identity. The Redis backend, by contrast, namespaces buckets by window (rate_limiting.py:229, f"{prefix}:{key}:{window_seconds}"). So reusing one key string with two different window_seconds diverges across backends: Redis isolates them, Postgres conflates them. Worse, the window-relative prune (delete window_start < to_timestamp(v_prev_start), lines 120-121) means a short-window hit deletes a longer-window's live bucket for the same key (a 60s hit prunes everything older than ~1 minute, wiping a concurrent 3600s bucket). No live impact yet (no call sites), but US-076 (per-key + per-session/IP) and US-077 (per-workspace) are the multi-window callers that can trip this. Fix: compose the stored bucket_key as p_key || ':' || p_window_seconds inside the RPC to mirror Redis, or document+enforce a one-window-per-key invariant on the seam.backend/rate_limiting.py:232- PostgresRateLimiter's RPC rejects negative cost (migration rate_limit_hit, p_cost < 0 -> exception), but RedisRateLimiter.hit passes cost straight to INCRBY, so a negative cost would silently decrement the Redis counter instead of raising. cost is server-composed and always >= 1 today, so this is a backend-parity/robustness nit rather than a live risk; a guard mirroring the RPC (reject cost < 0) keeps the two backends symmetric.🔧 Fix: scope rate-limit counters by (key,window); guard Redis cost
1 info still open:
backend/rate_limiting.py:36- After the round-2 fix made (bucket_key, window_seconds, window_start) the Postgres counter identity, two doc spots still describe the prune bound as "per key", which now understates it: a single key can hold up to 2 live rows per DISTINCT window size used with it, not 2 total. The module docstring (rate_limiting.py:35-36, "bounded to <=2 live buckets per key") and the AGENTS.md algorithm bullet (AGENTS.md:176, "keeps it bounded to ≤2 live rows per key") both say "per key", while the migration comments (20260624170000_rate_limit_counters.sql:29) and the PRD Status line (prd-phase2-implementation.md:950) were correctly updated to "per (key, window)". Align the two stale spots to "per (key, window)" so the bound is described consistently.✅ **Test** - passed
✅ No issues found.
python -m backend.test_us075_rate_limiter- unit layer (factory default=postgres/selection/normalization, no-in-memory rejection, fail-closed config, Redis lazy-import RuntimeError, RateLimitDecision shape + ABC abstractness, Postgres backend over httpx MockTransport, Redis backend over a fake client) AND integration layer ran live against local Supabase (PRD Validation Test: 5 hits survive a simulated restart read back at 5; limit decision flips allowed True->False over the window)Live PostgREST evidence scriptdrive_live_rate_limiter.shdriving rate_limit_hit/rate_limit_count under the service role exactly as PostgresRateLimiter does: durable counting 1..5, peek-doesn't-increment, 6th/7th hit allowed:false while still countingSecurity boundary live: anon rate_limit_hit and rate_limit_count -> HTTP 401 code 42501 permission denied; anon direct table read -> 0 rows (deny-all RLS); psql has_function_privilege confirms anon/authenticated FALSE, service_role TRUEPer-(key,window) scoping (review commit 46a895b) live: same key under a 60s window kept an independent count while the 3600s window read 7 untouched; persisted rate_limit_counters showed exactly 2 bounded rowsdocs/adr/- docs/adr/ has no 0008-*.md file, yet US-075's docs (AGENTS.md, PRD) cite "ADR-0008" repeatedly as the support-surface decision record - as do US-066 through US-074. This gap is pre-existing (it predates US-075; every ADR-0008-era story references a non-existent ADR), and authoring the ADR is a substantial human-judgment task about scope and content, not a docs-sync edit. Flagging it because US-075's documentation leans on it; not fixed because it is out of scope for this change and needs an author decision.✅ **Lint** - passed
✅ No issues found.
✅ **Push** - passed
✅ No issues found.