Skip to content

feat(support): swappable RateLimiter seam with durable Postgres default (US-075)#41

Merged
hcho22 merged 5 commits into
mainfrom
feat/us075-rate-limiter-seam
Jun 26, 2026
Merged

feat(support): swappable RateLimiter seam with durable Postgres default (US-075)#41
hcho22 merged 5 commits into
mainfrom
feat/us075-rate-limiter-seam

Conversation

@hcho22

@hcho22 hcho22 commented Jun 26, 2026

Copy link
Copy Markdown
Owner

Intent

The developer asked the agent to implement user story US-075 from their phase-2 implementation PRD (.claude/agent/tasks/prd-phase2-implementation.md), which calls for a swappable RateLimiter seam for the public support widget's abuse controls. The story required mirroring the repo's existing provider-factory patterns (reranking, web_search, parsing) with a RateLimiter ABC plus an env-selected factory, shipping only the seam and its migration with no call site yet (later stories wire it onto endpoints), using a durable Postgres default backend and deliberately providing no in-memory option. After implementing it, the developer invoked the /no-mistakes gate on US-075 to validate the change through automated review, tests, lint, docs, push, PR, and CI before it reaches the configured push target.

What Changed

  • Added backend/rate_limiting.py: a RateLimiter ABC (hit/count/aclose) returning a RateLimitDecision, plus an env-selected build_rate_limiter() factory (RATE_LIMITER=postgres|redis, default postgres) that mirrors the repo's existing provider factories and fails closed at build time on missing config. There is deliberately no in-memory backend (it under-counts per replica and resets on restart), and this ships the seam only - no call sites yet, which US-076/077 will wire onto the widget endpoints.
  • Added migration 20260624170000_rate_limit_counters.sql: a rate_limit_counters table (RLS enabled, zero policies - anon/authenticated denied wholesale) keyed by (bucket_key, window_seconds, window_start), plus service-role-only SECURITY DEFINER RPCs rate_limit_hit/rate_limit_count implementing a two-bucket sliding-window counter with an in-RPC prune bounded to ≤2 live rows per (key, window).
  • Tightened and hardened the seam over the branch: constrained window_seconds from float to int, scoped both Postgres and Redis counters by (key, window) so multi-window callers don't conflate or prune each other, guarded the Redis backend against negative cost for parity with the RPC, and added backend/test_us075_rate_limiter.py plus AGENTS.md / PRD doc sync.

Risk Assessment

✅ Low: A well-bounded, no-call-site seam whose two substantive issues were already resolved in prior review rounds; only a minor stale-doc phrasing inconsistency remains.

Testing

Ran the full US-075 test module (unit + integration) - both layers pass, with the integration layer executing the PRD Validation Test live against the running local Supabase (durable counter survives a simulated restart at value 5; limit decision flips allowed True->False). I then drove the rate_limit_hit/rate_limit_count RPCs directly over the live PostgREST (the exact path PostgresRateLimiter uses) to capture reviewer-visible product evidence: durable sliding-window counting, peek-does-not-increment, the limit flip with blocked hits still counting, the deny-all/service-role-only boundary (anon refused with 42501 on both RPCs and 0 rows on a direct table read, privilege table FALSE for anon/authenticated and TRUE for service_role), the bounded persisted rate_limit_counters rows, and the per-(key,window) scoping fix from the review commits. No UI surface exists for this change (it's a backend seam with no call site yet, by design), so evidence is a CLI transcript + persisted DB state rather than a screenshot. Working tree left clean; evidence written to the dedicated evidence directory.

Evidence: Live PostgREST rate-limiter transcript (durability, limit flip, deny-all boundary, per-(key,window) scoping, persisted rows)

[1] Durable counting: turns 1..5 -> current_count 1..5, allowed:true [2] Peek twice -> 5, 5 (no increment) [3] turn 6 -> allowed:false count 6; turn 7 -> allowed:false count 7 (still counting while blocked) [4] anon rate_limit_hit -> HTTP 401 {"code":"42501",...permission denied for function rate_limit_hit}; anon rate_limit_count -> HTTP 401 42501; anon GET /rate_limit_counters -> HTTP 200 [] (deny-all RLS) [5] same key window=60s cost=2 -> count 2 (independent); window=3600s still reads 7 (untouched) [6] persisted rows: (key,60,count=2) and (key,3600,count=7) -> 2 rows, bounded [7] has_function_privilege: rate_limit_hit/rate_limit_count anon=f authenticated=f service_role=t

============================================================
 US-075 RateLimiter seam - LIVE against local Supabase PostgREST
 key=ip:203.0.113.7-24460  window=3600s  limit=5
============================================================

[1] Durable sliding-window counting (each customer turn = one hit):
  turn 1  rate_limit_hit -> [{"allowed":true,"current_count":1,"limit_value":5,"window_seconds":3600}]
  turn 2  rate_limit_hit -> [{"allowed":true,"current_count":2,"limit_value":5,"window_seconds":3600}]
  turn 3  rate_limit_hit -> [{"allowed":true,"current_count":3,"limit_value":5,"window_seconds":3600}]
  turn 4  rate_limit_hit -> [{"allowed":true,"current_count":4,"limit_value":5,"window_seconds":3600}]
  turn 5  rate_limit_hit -> [{"allowed":true,"current_count":5,"limit_value":5,"window_seconds":3600}]

[2] Read-only peek does NOT increment (rate_limit_count twice):
  peek -> 5
  peek -> 5   (unchanged: a peek records no hit)

[3] Limit decision flips once the window is exceeded (6th hit, limit=5):
  turn 6  rate_limit_hit -> [{"allowed":false,"current_count":6,"limit_value":5,"window_seconds":3600}]   <-- allowed:false
  turn 7  rate_limit_hit -> [{"allowed":false,"current_count":7,"limit_value":5,"window_seconds":3600}]   <-- still counting while blocked

[4] Security boundary - the abuse counters are deny-all RLS, service-role only.
    anon must be REFUSED at the live PostgREST layer (42501 permission denied):
  anon rate_limit_hit   -> HTTP 401  body={"code":"42501","details":null,"hint":null,"message":"permission denied for function rate_limit_hit"}
  anon rate_limit_count -> HTTP 401  body={"code":"42501","details":null,"hint":null,"message":"permission denied for function rate_limit_count"}
  anon GET /rate_limit_counters -> HTTP 200  body=[]  (deny-all RLS: 0 rows visible)

[5] Per-(key,window) scoping (review commit 46a895b): the SAME key under a
    DIFFERENT window size is an INDEPENDENT counter - not conflated, and the
    short-window prune never deletes the long-window's live bucket.
  hit  window=60s cost=2 -> [{"allowed":true,"current_count":2,"limit_value":100,"window_seconds":60}]
  peek window=60s        -> 2   (independent 60s bucket)
  peek window=3600s      -> 7   (the 3600s window still reads 7, untouched)

[6] Persisted DB state - the actual bounded rows in public.rate_limit_counters
    for this key (<=2 live buckets per (key,window), no per-hit row growth):
      bucket_key      | window_seconds |      window_start      | count 
----------------------+----------------+------------------------+-------
 ip:203.0.113.7-24460 |             60 | 2026-06-26 13:37:00+00 |     2
 ip:203.0.113.7-24460 |           3600 | 2026-06-26 13:00:00+00 |     7
(2 rows)


[7] RPC execute privileges in the DB (anon/authenticated FALSE, service_role TRUE):
       rpc        | anon | authenticated | service_role 
------------------+------+---------------+--------------
 rate_limit_hit   | f    | f             | t
 rate_limit_count | f    | f             | t
(2 rows)


[cleanup] removing this run's counter rows...
done.
Evidence: Evidence driver script (live RPCs against local Supabase PostgREST)
#!/usr/bin/env bash
# US-075 live evidence: drive the rate_limit_hit / rate_limit_count RPCs against
# the running local Supabase PostgREST exactly as backend/rate_limiting.py's
# PostgresRateLimiter does, and show (a) durable counting, (b) the limit decision
# flip, (c) the deny-all/service-role-only boundary, (d) per-(key,window) scoping,
# (e) the persisted DB rows.
set -euo pipefail

BASE="http://127.0.0.1:54321"
SVC="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZS1kZW1vIiwicm9sZSI6InNlcnZpY2Vfcm9sZSIsImV4cCI6MTk4MzgxMjk5Nn0.EGIM96RAZx35lJzdJsyH-qQwv8Hdp7fsn3W0YpN81IU"
ANON="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZS1kZW1vIiwicm9sZSI6ImFub24iLCJleHAiOjE5ODM4MTI5OTZ9.CRXP1A7WOeoJeXxjNni43kdQwgnWNReilDMblYTn_I0"

KEY="ip:203.0.113.7-$$"          # unique per run
WIN=3600                          # huge window -> prev bucket empty, estimate == raw count

hit()  { # $1=cost $2=limit $3=window  -> service-role rate_limit_hit
  curl -s -X POST "$BASE/rest/v1/rpc/rate_limit_hit" \
    -H "apikey: $SVC" -H "Authorization: Bearer $SVC" -H "Content-Type: application/json" \
    -d "{\"p_key\":\"$KEY\",\"p_limit\":$2,\"p_window_seconds\":$3,\"p_cost\":$1}"
}
peek() { # $1=window  -> service-role rate_limit_count (read-only)
  curl -s -X POST "$BASE/rest/v1/rpc/rate_limit_count" \
    -H "apikey: $SVC" -H "Authorization: Bearer $SVC" -H "Content-Type: application/json" \
    -d "{\"p_key\":\"$KEY\",\"p_window_seconds\":$1}"
}

echo "============================================================"
echo " US-075 RateLimiter seam - LIVE against local Supabase PostgREST"
echo " key=$KEY  window=${WIN}s  limit=5"
echo "============================================================"

echo
echo "[1] Durable sliding-window counting (each customer turn = one hit):"
for i in 1 2 3 4 5; do
  printf "  turn %d  rate_limit_hit -> %s\n" "$i" "$(hit 1 5 "$WIN")"
done

echo
echo "[2] Read-only peek does NOT increment (rate_limit_count twice):"
printf "  peek -> %s\n" "$(peek "$WIN")"
printf "  peek -> %s   (unchanged: a peek records no hit)\n" "$(peek "$WIN")"

echo
echo "[3] Limit decision flips once the window is exceeded (6th hit, limit=5):"
printf "  turn 6  rate_limit_hit -> %s   <-- allowed:false\n" "$(hit 1 5 "$WIN")"
printf "  turn 7  rate_limit_hit -> %s   <-- still counting while blocked\n" "$(hit 1 5 "$WIN")"

echo
echo "[4] Security boundary - the abuse counters are deny-all RLS, service-role only."
echo "    anon must be REFUSED at the live PostgREST layer (42501 permission denied):"
CODE=$(curl -s -o /tmp/us075_anon_body.$$ -w "%{http_code}" -X POST "$BASE/rest/v1/rpc/rate_limit_hit" \
  -H "apikey: $ANON" -H "Authorization: Bearer $ANON" -H "Content-Type: application/json" \
  -d "{\"p_key\":\"$KEY\",\"p_limit\":5,\"p_window_seconds\":$WIN,\"p_cost\":1}")
printf "  anon rate_limit_hit   -> HTTP %s  body=%s\n" "$CODE" "$(cat /tmp/us075_anon_body.$$)"
CODE2=$(curl -s -o /tmp/us075_anon_body2.$$ -w "%{http_code}" -X POST "$BASE/rest/v1/rpc/rate_limit_count" \
  -H "apikey: $ANON" -H "Authorization: Bearer $ANON" -H "Content-Type: application/json" \
  -d "{\"p_key\":\"$KEY\",\"p_window_seconds\":$WIN}")
printf "  anon rate_limit_count -> HTTP %s  body=%s\n" "$CODE2" "$(cat /tmp/us075_anon_body2.$$)"
rm -f /tmp/us075_anon_body.$$ /tmp/us075_anon_body2.$$
CODE3=$(curl -s -o /tmp/us075_tbl.$$ -w "%{http_code}" \
  -H "apikey: $ANON" -H "Authorization: Bearer $ANON" \
  "$BASE/rest/v1/rate_limit_counters?select=*&bucket_key=eq.$KEY")
printf "  anon GET /rate_limit_counters -> HTTP %s  body=%s  (deny-all RLS: 0 rows visible)\n" "$CODE3" "$(cat /tmp/us075_tbl.$$)"
rm -f /tmp/us075_tbl.$$

echo
echo "[5] Per-(key,window) scoping (review commit 46a895b): the SAME key under a"
echo "    DIFFERENT window size is an INDEPENDENT counter - not conflated, and the"
echo "    short-window prune never deletes the long-window's live bucket."
SHORTWIN=60
printf "  hit  window=%ss cost=2 -> %s\n" "$SHORTWIN" "$(hit 2 100 "$SHORTWIN")"
printf "  peek window=%ss        -> %s   (independent 60s bucket)\n"   "$SHORTWIN" "$(peek "$SHORTWIN")"
printf "  peek window=%ss      -> %s   (the 3600s window still reads 7, untouched)\n" "$WIN" "$(peek "$WIN")"

echo
echo "[6] Persisted DB state - the actual bounded rows in public.rate_limit_counters"
echo "    for this key (<=2 live buckets per (key,window), no per-hit row growth):"
PGURL="postgresql://postgres:postgres@127.0.0.1:54322/postgres"
psql "$PGURL" -P pager=off -c \
  "select bucket_key, window_seconds, window_start, count from public.rate_limit_counters where bucket_key = '$KEY' order by window_seconds, window_start;"

echo
echo "[7] RPC execute privileges in the DB (anon/authenticated FALSE, service_role TRUE):"
psql "$PGURL" -P pager=off -c \
  "select 'rate_limit_hit' as rpc,
          has_function_privilege('anon','public.rate_limit_hit(text,integer,integer,integer)','EXECUTE') as anon,
          has_function_privilege('authenticated','public.rate_limit_hit(text,integer,integer,integer)','EXECUTE') as authenticated,
          has_function_privilege('service_role','public.rate_limit_hit(text,integer,integer,integer)','EXECUTE') as service_role
   union all
   select 'rate_limit_count',
          has_function_privilege('anon','public.rate_limit_count(text,integer)','EXECUTE'),
          has_function_privilege('authenticated','public.rate_limit_count(text,integer)','EXECUTE'),
          has_function_privilege('service_role','public.rate_limit_count(text,integer)','EXECUTE');"

echo
echo "[cleanup] removing this run's counter rows..."
curl -s -o /dev/null -X DELETE "$BASE/rest/v1/rate_limit_counters?bucket_key=eq.$KEY" \
  -H "apikey: $SVC" -H "Authorization: Bearer $SVC"
echo "done."

Pipeline

Updates from git push no-mistakes

✅ **intent** - passed

✅ No issues found.

✅ **Rebase** - passed

✅ No issues found.

⚠️ **Review** - 1 info
  • ℹ️ backend/rate_limiting.py:162 - The seam types window_seconds: float (RateLimitDecision and every ABC/impl method), but neither backend actually honors non-integer windows. PostgresRateLimiter sends int(window_seconds) (rate_limiting.py:162,182), so a sub-second window (e.g. 0.5) truncates to 0 and the RPC raises p_window_seconds must be &gt; 0, and a fractional window like 1.5 silently becomes 1. RedisRateLimiter uses the float for idx/weight math but int(window_seconds) for the key namespace (rate_limiting.py:229), so two fractional windows that floor to the same integer (1.4 and 1.6) would share buckets while computing different weights. The module docstring promises behavior is 'byte-identical no matter which backend is configured', which holds only for integer windows. No current impact (no call sites; US-076/077 use integer-second windows), but the float type advertises a capability neither backend supports - worth constraining the contract to int or documenting integer-only before US-076 wires it.
  • ℹ️ supabase/migrations/20260624170000_rate_limit_counters.sql:116 - Under the default Postgres backend, every hit on a key performs INSERT...ON CONFLICT DO UPDATE + SELECT(prev) + DELETE(prune) against the same bucket row, so a focused flood on one bucket_key serializes on that row's lock and generates a sustained write+dead-tuple/autovacuum stream on the very abuse path the limiter exists to absorb. This is the documented Postgres-default tradeoff (Redis is the named scale path), and it fails safe (the limiter throttles rather than letting requests through), but flag it for US-076 capacity planning: a single hot key is a DB-throughput chokepoint, not a free counter.

🔧 Fix: tighten rate limiter window_seconds contract from float to int
2 issues (1 warning, 1 info) still open:

  • ⚠️ supabase/migrations/20260624170000_rate_limit_counters.sql:107 - The Postgres counter row is identified by (bucket_key, window_start) with bucket_key = p_key only - the window size is NOT part of the identity. The Redis backend, by contrast, namespaces buckets by window (rate_limiting.py:229, f"{prefix}:{key}:{window_seconds}"). So reusing one key string with two different window_seconds diverges across backends: Redis isolates them, Postgres conflates them. Worse, the window-relative prune (delete window_start < to_timestamp(v_prev_start), lines 120-121) means a short-window hit deletes a longer-window's live bucket for the same key (a 60s hit prunes everything older than ~1 minute, wiping a concurrent 3600s bucket). No live impact yet (no call sites), but US-076 (per-key + per-session/IP) and US-077 (per-workspace) are the multi-window callers that can trip this. Fix: compose the stored bucket_key as p_key || ':' || p_window_seconds inside the RPC to mirror Redis, or document+enforce a one-window-per-key invariant on the seam.
  • ℹ️ backend/rate_limiting.py:232 - PostgresRateLimiter's RPC rejects negative cost (migration rate_limit_hit, p_cost < 0 -> exception), but RedisRateLimiter.hit passes cost straight to INCRBY, so a negative cost would silently decrement the Redis counter instead of raising. cost is server-composed and always >= 1 today, so this is a backend-parity/robustness nit rather than a live risk; a guard mirroring the RPC (reject cost < 0) keeps the two backends symmetric.

🔧 Fix: scope rate-limit counters by (key,window); guard Redis cost
1 info still open:

  • ℹ️ backend/rate_limiting.py:36 - After the round-2 fix made (bucket_key, window_seconds, window_start) the Postgres counter identity, two doc spots still describe the prune bound as "per key", which now understates it: a single key can hold up to 2 live rows per DISTINCT window size used with it, not 2 total. The module docstring (rate_limiting.py:35-36, "bounded to <=2 live buckets per key") and the AGENTS.md algorithm bullet (AGENTS.md:176, "keeps it bounded to ≤2 live rows per key") both say "per key", while the migration comments (20260624170000_rate_limit_counters.sql:29) and the PRD Status line (prd-phase2-implementation.md:950) were correctly updated to "per (key, window)". Align the two stale spots to "per (key, window)" so the bound is described consistently.
✅ **Test** - passed

✅ No issues found.

  • python -m backend.test_us075_rate_limiter - unit layer (factory default=postgres/selection/normalization, no-in-memory rejection, fail-closed config, Redis lazy-import RuntimeError, RateLimitDecision shape + ABC abstractness, Postgres backend over httpx MockTransport, Redis backend over a fake client) AND integration layer ran live against local Supabase (PRD Validation Test: 5 hits survive a simulated restart read back at 5; limit decision flips allowed True->False over the window)
  • Live PostgREST evidence script drive_live_rate_limiter.sh driving rate_limit_hit/rate_limit_count under the service role exactly as PostgresRateLimiter does: durable counting 1..5, peek-doesn't-increment, 6th/7th hit allowed:false while still counting
  • Security boundary live: anon rate_limit_hit and rate_limit_count -> HTTP 401 code 42501 permission denied; anon direct table read -> 0 rows (deny-all RLS); psql has_function_privilege confirms anon/authenticated FALSE, service_role TRUE
  • Per-(key,window) scoping (review commit 46a895b) live: same key under a 60s window kept an independent count while the 3600s window read 7 untouched; persisted rate_limit_counters showed exactly 2 bounded rows
⚠️ **Document** - 1 info
  • ℹ️ docs/adr/ - docs/adr/ has no 0008-*.md file, yet US-075's docs (AGENTS.md, PRD) cite "ADR-0008" repeatedly as the support-surface decision record - as do US-066 through US-074. This gap is pre-existing (it predates US-075; every ADR-0008-era story references a non-existent ADR), and authoring the ADR is a substantial human-judgment task about scope and content, not a docs-sync edit. Flagging it because US-075's documentation leans on it; not fixed because it is out of scope for this change and needs an author decision.
✅ **Lint** - passed

✅ No issues found.

✅ **Push** - passed

✅ No issues found.

hcho22 added 5 commits June 25, 2026 22:05
…US-075)

Add backend/rate_limiting.py: a RateLimiter ABC + env-selected factory mirroring
the reranking/web_search/parsing seams. RATE_LIMITER=postgres (default) | redis;
no in-memory backend (per-replica under-count + restart reset would silently
disable the cost-DoS guard, ADR-0008). PostgresRateLimiter reaches counters via
two service-role SECURITY DEFINER RPCs over PostgREST (same posture as US-071's
resume_conversation); RedisRateLimiter is an optional scale adapter (lazy import,
redis not in requirements). Both use a bounded two-bucket sliding-window counter.

Migration 20260624170000_rate_limit_counters.sql adds the deny-all RLS counter
table + rate_limit_hit/rate_limit_count RPCs, granted to service_role only.

Ships only the seam + migration (no call site yet by design); US-076 wires it
onto the widget endpoints, US-077 the per-workspace circuit breaker.

Test: python -m backend.test_us075_rate_limiter (unit always-on + integration
skip-clean). Verified against local Supabase: migration applies, all assertions
pass, deny-all/service-role-only boundary holds at the live PostgREST layer.
@vercel

vercel Bot commented Jun 26, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agentic-rag Ready Ready Preview, Comment Jun 26, 2026 2:06pm

@github-actions

Copy link
Copy Markdown
Contributor

Retrieval eval — PR vs main

n = 50 questions × 3 modes (vector, keyword, hybrid) on a 14-chunk corpus. PR ran in 69.59s; main in 74.3s.

Headline (each cell: PR value, Δ vs main)

Mode recall@5 MRR nDCG@5
vector 0.860 (±0.000) 0.772 (±0.000) 0.779 (±0.000)
keyword 0.110 (±0.000) 0.120 (±0.000) 0.112 (±0.000)
hybrid 0.860 (±0.000) 0.759 (±0.000) 0.769 (±0.000)

Per-category recall@5

Mode single_chunk multi_hop adversarial paraphrase
vector 0.900 (±0.000) 0.933 (±0.000) 0.600 (±0.000) 1.000 (±0.000)
keyword 0.250 (±0.000) 0.033 (±0.000) 0.000 (±0.000) 0.000 (±0.000)
hybrid 0.900 (±0.000) 0.933 (±0.000) 0.600 (±0.000) 1.000 (±0.000)

Comment is updated in place on each push by .github/workflows/retrieval-eval.yml (US-035). Comment-only — never blocks the build.

@hcho22 hcho22 merged commit 4d19b79 into main Jun 26, 2026
3 checks passed
@hcho22 hcho22 deleted the feat/us075-rate-limiter-seam branch June 26, 2026 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant