feat(support): swappable RateLimiter seam with durable Postgres default (US-075) by hcho22 · Pull Request #41 · hcho22/Agentic_RAG

hcho22 · 2026-06-26T14:05:58Z

Intent

The developer asked the agent to implement user story US-075 from their phase-2 implementation PRD (.claude/agent/tasks/prd-phase2-implementation.md), which calls for a swappable RateLimiter seam for the public support widget's abuse controls. The story required mirroring the repo's existing provider-factory patterns (reranking, web_search, parsing) with a RateLimiter ABC plus an env-selected factory, shipping only the seam and its migration with no call site yet (later stories wire it onto endpoints), using a durable Postgres default backend and deliberately providing no in-memory option. After implementing it, the developer invoked the /no-mistakes gate on US-075 to validate the change through automated review, tests, lint, docs, push, PR, and CI before it reaches the configured push target.

What Changed

Added backend/rate_limiting.py: a RateLimiter ABC (hit/count/aclose) returning a RateLimitDecision, plus an env-selected build_rate_limiter() factory (RATE_LIMITER=postgres|redis, default postgres) that mirrors the repo's existing provider factories and fails closed at build time on missing config. There is deliberately no in-memory backend (it under-counts per replica and resets on restart), and this ships the seam only - no call sites yet, which US-076/077 will wire onto the widget endpoints.
Added migration 20260624170000_rate_limit_counters.sql: a rate_limit_counters table (RLS enabled, zero policies - anon/authenticated denied wholesale) keyed by (bucket_key, window_seconds, window_start), plus service-role-only SECURITY DEFINER RPCs rate_limit_hit/rate_limit_count implementing a two-bucket sliding-window counter with an in-RPC prune bounded to ≤2 live rows per (key, window).
Tightened and hardened the seam over the branch: constrained window_seconds from float to int, scoped both Postgres and Redis counters by (key, window) so multi-window callers don't conflate or prune each other, guarded the Redis backend against negative cost for parity with the RPC, and added backend/test_us075_rate_limiter.py plus AGENTS.md / PRD doc sync.

Risk Assessment

✅ Low: A well-bounded, no-call-site seam whose two substantive issues were already resolved in prior review rounds; only a minor stale-doc phrasing inconsistency remains.

Testing

Ran the full US-075 test module (unit + integration) - both layers pass, with the integration layer executing the PRD Validation Test live against the running local Supabase (durable counter survives a simulated restart at value 5; limit decision flips allowed True->False). I then drove the rate_limit_hit/rate_limit_count RPCs directly over the live PostgREST (the exact path PostgresRateLimiter uses) to capture reviewer-visible product evidence: durable sliding-window counting, peek-does-not-increment, the limit flip with blocked hits still counting, the deny-all/service-role-only boundary (anon refused with 42501 on both RPCs and 0 rows on a direct table read, privilege table FALSE for anon/authenticated and TRUE for service_role), the bounded persisted rate_limit_counters rows, and the per-(key,window) scoping fix from the review commits. No UI surface exists for this change (it's a backend seam with no call site yet, by design), so evidence is a CLI transcript + persisted DB state rather than a screenshot. Working tree left clean; evidence written to the dedicated evidence directory.

Evidence: Live PostgREST rate-limiter transcript (durability, limit flip, deny-all boundary, per-(key,window) scoping, persisted rows)

[1] Durable counting: turns 1..5 -> current_count 1..5, allowed:true [2] Peek twice -> 5, 5 (no increment) [3] turn 6 -> allowed:false count 6; turn 7 -> allowed:false count 7 (still counting while blocked) [4] anon rate_limit_hit -> HTTP 401 {"code":"42501",...permission denied for function rate_limit_hit}; anon rate_limit_count -> HTTP 401 42501; anon GET /rate_limit_counters -> HTTP 200 [] (deny-all RLS) [5] same key window=60s cost=2 -> count 2 (independent); window=3600s still reads 7 (untouched) [6] persisted rows: (key,60,count=2) and (key,3600,count=7) -> 2 rows, bounded [7] has_function_privilege: rate_limit_hit/rate_limit_count anon=f authenticated=f service_role=t

============================================================
 US-075 RateLimiter seam - LIVE against local Supabase PostgREST
 key=ip:203.0.113.7-24460  window=3600s  limit=5
============================================================

[1] Durable sliding-window counting (each customer turn = one hit):
  turn 1  rate_limit_hit -> [{"allowed":true,"current_count":1,"limit_value":5,"window_seconds":3600}]
  turn 2  rate_limit_hit -> [{"allowed":true,"current_count":2,"limit_value":5,"window_seconds":3600}]
  turn 3  rate_limit_hit -> [{"allowed":true,"current_count":3,"limit_value":5,"window_seconds":3600}]
  turn 4  rate_limit_hit -> [{"allowed":true,"current_count":4,"limit_value":5,"window_seconds":3600}]
  turn 5  rate_limit_hit -> [{"allowed":true,"current_count":5,"limit_value":5,"window_seconds":3600}]

[2] Read-only peek does NOT increment (rate_limit_count twice):
  peek -> 5
  peek -> 5   (unchanged: a peek records no hit)

[3] Limit decision flips once the window is exceeded (6th hit, limit=5):
  turn 6  rate_limit_hit -> [{"allowed":false,"current_count":6,"limit_value":5,"window_seconds":3600}]   <-- allowed:false
  turn 7  rate_limit_hit -> [{"allowed":false,"current_count":7,"limit_value":5,"window_seconds":3600}]   <-- still counting while blocked

[4] Security boundary - the abuse counters are deny-all RLS, service-role only.
    anon must be REFUSED at the live PostgREST layer (42501 permission denied):
  anon rate_limit_hit   -> HTTP 401  body={"code":"42501","details":null,"hint":null,"message":"permission denied for function rate_limit_hit"}
  anon rate_limit_count -> HTTP 401  body={"code":"42501","details":null,"hint":null,"message":"permission denied for function rate_limit_count"}
  anon GET /rate_limit_counters -> HTTP 200  body=[]  (deny-all RLS: 0 rows visible)

[5] Per-(key,window) scoping (review commit 46a895b): the SAME key under a
    DIFFERENT window size is an INDEPENDENT counter - not conflated, and the
    short-window prune never deletes the long-window's live bucket.
  hit  window=60s cost=2 -> [{"allowed":true,"current_count":2,"limit_value":100,"window_seconds":60}]
  peek window=60s        -> 2   (independent 60s bucket)
  peek window=3600s      -> 7   (the 3600s window still reads 7, untouched)

[6] Persisted DB state - the actual bounded rows in public.rate_limit_counters
    for this key (<=2 live buckets per (key,window), no per-hit row growth):
      bucket_key      | window_seconds |      window_start      | count 
----------------------+----------------+------------------------+-------
 ip:203.0.113.7-24460 |             60 | 2026-06-26 13:37:00+00 |     2
 ip:203.0.113.7-24460 |           3600 | 2026-06-26 13:00:00+00 |     7
(2 rows)


[7] RPC execute privileges in the DB (anon/authenticated FALSE, service_role TRUE):
       rpc        | anon | authenticated | service_role 
------------------+------+---------------+--------------
 rate_limit_hit   | f    | f             | t
 rate_limit_count | f    | f             | t
(2 rows)


[cleanup] removing this run's counter rows...
done.

Evidence: Evidence driver script (live RPCs against local Supabase PostgREST)

#!/usr/bin/env bash
# US-075 live evidence: drive the rate_limit_hit / rate_limit_count RPCs against
# the running local Supabase PostgREST exactly as backend/rate_limiting.py's
# PostgresRateLimiter does, and show (a) durable counting, (b) the limit decision
# flip, (c) the deny-all/service-role-only boundary, (d) per-(key,window) scoping,
# (e) the persisted DB rows.
set -euo pipefail

BASE="http://127.0.0.1:54321"
SVC="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZS1kZW1vIiwicm9sZSI6InNlcnZpY2Vfcm9sZSIsImV4cCI6MTk4MzgxMjk5Nn0.EGIM96RAZx35lJzdJsyH-qQwv8Hdp7fsn3W0YpN81IU"
ANON="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZS1kZW1vIiwicm9sZSI6ImFub24iLCJleHAiOjE5ODM4MTI5OTZ9.CRXP1A7WOeoJeXxjNni43kdQwgnWNReilDMblYTn_I0"

KEY="ip:203.0.113.7-$$"          # unique per run
WIN=3600                          # huge window -> prev bucket empty, estimate == raw count

hit()  { # $1=cost $2=limit $3=window  -> service-role rate_limit_hit
  curl -s -X POST "$BASE/rest/v1/rpc/rate_limit_hit" \
    -H "apikey: $SVC" -H "Authorization: Bearer $SVC" -H "Content-Type: application/json" \
    -d "{\"p_key\":\"$KEY\",\"p_limit\":$2,\"p_window_seconds\":$3,\"p_cost\":$1}"
}
peek() { # $1=window  -> service-role rate_limit_count (read-only)
  curl -s -X POST "$BASE/rest/v1/rpc/rate_limit_count" \
    -H "apikey: $SVC" -H "Authorization: Bearer $SVC" -H "Content-Type: application/json" \
    -d "{\"p_key\":\"$KEY\",\"p_window_seconds\":$1}"
}

echo "============================================================"
echo " US-075 RateLimiter seam - LIVE against local Supabase PostgREST"
echo " key=$KEY  window=${WIN}s  limit=5"
echo "============================================================"

echo
echo "[1] Durable sliding-window counting (each customer turn = one hit):"
for i in 1 2 3 4 5; do
  printf "  turn %d  rate_limit_hit -> %s\n" "$i" "$(hit 1 5 "$WIN")"
done

echo
echo "[2] Read-only peek does NOT increment (rate_limit_count twice):"
printf "  peek -> %s\n" "$(peek "$WIN")"
printf "  peek -> %s   (unchanged: a peek records no hit)\n" "$(peek "$WIN")"

echo
echo "[3] Limit decision flips once the window is exceeded (6th hit, limit=5):"
printf "  turn 6  rate_limit_hit -> %s   <-- allowed:false\n" "$(hit 1 5 "$WIN")"
printf "  turn 7  rate_limit_hit -> %s   <-- still counting while blocked\n" "$(hit 1 5 "$WIN")"

echo
echo "[4] Security boundary - the abuse counters are deny-all RLS, service-role only."
echo "    anon must be REFUSED at the live PostgREST layer (42501 permission denied):"
CODE=$(curl -s -o /tmp/us075_anon_body.$$ -w "%{http_code}" -X POST "$BASE/rest/v1/rpc/rate_limit_hit" \
  -H "apikey: $ANON" -H "Authorization: Bearer $ANON" -H "Content-Type: application/json" \
  -d "{\"p_key\":\"$KEY\",\"p_limit\":5,\"p_window_seconds\":$WIN,\"p_cost\":1}")
printf "  anon rate_limit_hit   -> HTTP %s  body=%s\n" "$CODE" "$(cat /tmp/us075_anon_body.$$)"
CODE2=$(curl -s -o /tmp/us075_anon_body2.$$ -w "%{http_code}" -X POST "$BASE/rest/v1/rpc/rate_limit_count" \
  -H "apikey: $ANON" -H "Authorization: Bearer $ANON" -H "Content-Type: application/json" \
  -d "{\"p_key\":\"$KEY\",\"p_window_seconds\":$WIN}")
printf "  anon rate_limit_count -> HTTP %s  body=%s\n" "$CODE2" "$(cat /tmp/us075_anon_body2.$$)"
rm -f /tmp/us075_anon_body.$$ /tmp/us075_anon_body2.$$
CODE3=$(curl -s -o /tmp/us075_tbl.$$ -w "%{http_code}" \
  -H "apikey: $ANON" -H "Authorization: Bearer $ANON" \
  "$BASE/rest/v1/rate_limit_counters?select=*&bucket_key=eq.$KEY")
printf "  anon GET /rate_limit_counters -> HTTP %s  body=%s  (deny-all RLS: 0 rows visible)\n" "$CODE3" "$(cat /tmp/us075_tbl.$$)"
rm -f /tmp/us075_tbl.$$

echo
echo "[5] Per-(key,window) scoping (review commit 46a895b): the SAME key under a"
echo "    DIFFERENT window size is an INDEPENDENT counter - not conflated, and the"
echo "    short-window prune never deletes the long-window's live bucket."
SHORTWIN=60
printf "  hit  window=%ss cost=2 -> %s\n" "$SHORTWIN" "$(hit 2 100 "$SHORTWIN")"
printf "  peek window=%ss        -> %s   (independent 60s bucket)\n"   "$SHORTWIN" "$(peek "$SHORTWIN")"
printf "  peek window=%ss      -> %s   (the 3600s window still reads 7, untouched)\n" "$WIN" "$(peek "$WIN")"

echo
echo "[6] Persisted DB state - the actual bounded rows in public.rate_limit_counters"
echo "    for this key (<=2 live buckets per (key,window), no per-hit row growth):"
PGURL="postgresql://postgres:postgres@127.0.0.1:54322/postgres"
psql "$PGURL" -P pager=off -c \
  "select bucket_key, window_seconds, window_start, count from public.rate_limit_counters where bucket_key = '$KEY' order by window_seconds, window_start;"

echo
echo "[7] RPC execute privileges in the DB (anon/authenticated FALSE, service_role TRUE):"
psql "$PGURL" -P pager=off -c \
  "select 'rate_limit_hit' as rpc,
          has_function_privilege('anon','public.rate_limit_hit(text,integer,integer,integer)','EXECUTE') as anon,
          has_function_privilege('authenticated','public.rate_limit_hit(text,integer,integer,integer)','EXECUTE') as authenticated,
          has_function_privilege('service_role','public.rate_limit_hit(text,integer,integer,integer)','EXECUTE') as service_role
   union all
   select 'rate_limit_count',
          has_function_privilege('anon','public.rate_limit_count(text,integer)','EXECUTE'),
          has_function_privilege('authenticated','public.rate_limit_count(text,integer)','EXECUTE'),
          has_function_privilege('service_role','public.rate_limit_count(text,integer)','EXECUTE');"

echo
echo "[cleanup] removing this run's counter rows..."
curl -s -o /dev/null -X DELETE "$BASE/rest/v1/rate_limit_counters?bucket_key=eq.$KEY" \
  -H "apikey: $SVC" -H "Authorization: Bearer $SVC"
echo "done."

Pipeline

Updates from git push no-mistakes

✅ **intent** - passed

✅ No issues found.

✅ **Rebase** - passed

✅ No issues found.

⚠️ **Review** - 1 info

ℹ️ backend/rate_limiting.py:162 - The seam types window_seconds: float (RateLimitDecision and every ABC/impl method), but neither backend actually honors non-integer windows. PostgresRateLimiter sends int(window_seconds) (rate_limiting.py:162,182), so a sub-second window (e.g. 0.5) truncates to 0 and the RPC raises p_window_seconds must be > 0, and a fractional window like 1.5 silently becomes 1. RedisRateLimiter uses the float for idx/weight math but int(window_seconds) for the key namespace (rate_limiting.py:229), so two fractional windows that floor to the same integer (1.4 and 1.6) would share buckets while computing different weights. The module docstring promises behavior is 'byte-identical no matter which backend is configured', which holds only for integer windows. No current impact (no call sites; US-076/077 use integer-second windows), but the float type advertises a capability neither backend supports - worth constraining the contract to int or documenting integer-only before US-076 wires it.
ℹ️ supabase/migrations/20260624170000_rate_limit_counters.sql:116 - Under the default Postgres backend, every hit on a key performs INSERT...ON CONFLICT DO UPDATE + SELECT(prev) + DELETE(prune) against the same bucket row, so a focused flood on one bucket_key serializes on that row's lock and generates a sustained write+dead-tuple/autovacuum stream on the very abuse path the limiter exists to absorb. This is the documented Postgres-default tradeoff (Redis is the named scale path), and it fails safe (the limiter throttles rather than letting requests through), but flag it for US-076 capacity planning: a single hot key is a DB-throughput chokepoint, not a free counter.

🔧 Fix: tighten rate limiter window_seconds contract from float to int
2 issues (1 warning, 1 info) still open:

⚠️ supabase/migrations/20260624170000_rate_limit_counters.sql:107 - The Postgres counter row is identified by (bucket_key, window_start) with bucket_key = p_key only - the window size is NOT part of the identity. The Redis backend, by contrast, namespaces buckets by window (rate_limiting.py:229, f"{prefix}:{key}:{window_seconds}"). So reusing one key string with two different window_seconds diverges across backends: Redis isolates them, Postgres conflates them. Worse, the window-relative prune (delete window_start < to_timestamp(v_prev_start), lines 120-121) means a short-window hit deletes a longer-window's live bucket for the same key (a 60s hit prunes everything older than ~1 minute, wiping a concurrent 3600s bucket). No live impact yet (no call sites), but US-076 (per-key + per-session/IP) and US-077 (per-workspace) are the multi-window callers that can trip this. Fix: compose the stored bucket_key as p_key || ':' || p_window_seconds inside the RPC to mirror Redis, or document+enforce a one-window-per-key invariant on the seam.
ℹ️ backend/rate_limiting.py:232 - PostgresRateLimiter's RPC rejects negative cost (migration rate_limit_hit, p_cost < 0 -> exception), but RedisRateLimiter.hit passes cost straight to INCRBY, so a negative cost would silently decrement the Redis counter instead of raising. cost is server-composed and always >= 1 today, so this is a backend-parity/robustness nit rather than a live risk; a guard mirroring the RPC (reject cost < 0) keeps the two backends symmetric.

🔧 Fix: scope rate-limit counters by (key,window); guard Redis cost
1 info still open:

ℹ️ backend/rate_limiting.py:36 - After the round-2 fix made (bucket_key, window_seconds, window_start) the Postgres counter identity, two doc spots still describe the prune bound as "per key", which now understates it: a single key can hold up to 2 live rows per DISTINCT window size used with it, not 2 total. The module docstring (rate_limiting.py:35-36, "bounded to <=2 live buckets per key") and the AGENTS.md algorithm bullet (AGENTS.md:176, "keeps it bounded to ≤2 live rows per key") both say "per key", while the migration comments (20260624170000_rate_limit_counters.sql:29) and the PRD Status line (prd-phase2-implementation.md:950) were correctly updated to "per (key, window)". Align the two stale spots to "per (key, window)" so the bound is described consistently.

✅ **Test** - passed

✅ No issues found.

python -m backend.test_us075_rate_limiter - unit layer (factory default=postgres/selection/normalization, no-in-memory rejection, fail-closed config, Redis lazy-import RuntimeError, RateLimitDecision shape + ABC abstractness, Postgres backend over httpx MockTransport, Redis backend over a fake client) AND integration layer ran live against local Supabase (PRD Validation Test: 5 hits survive a simulated restart read back at 5; limit decision flips allowed True->False over the window)
Live PostgREST evidence script drive_live_rate_limiter.sh driving rate_limit_hit/rate_limit_count under the service role exactly as PostgresRateLimiter does: durable counting 1..5, peek-doesn't-increment, 6th/7th hit allowed:false while still counting
Security boundary live: anon rate_limit_hit and rate_limit_count -> HTTP 401 code 42501 permission denied; anon direct table read -> 0 rows (deny-all RLS); psql has_function_privilege confirms anon/authenticated FALSE, service_role TRUE
Per-(key,window) scoping (review commit 46a895b) live: same key under a 60s window kept an independent count while the 3600s window read 7 untouched; persisted rate_limit_counters showed exactly 2 bounded rows

⚠️ **Document** - 1 info

ℹ️ docs/adr/ - docs/adr/ has no 0008-*.md file, yet US-075's docs (AGENTS.md, PRD) cite "ADR-0008" repeatedly as the support-surface decision record - as do US-066 through US-074. This gap is pre-existing (it predates US-075; every ADR-0008-era story references a non-existent ADR), and authoring the ADR is a substantial human-judgment task about scope and content, not a docs-sync edit. Flagging it because US-075's documentation leans on it; not fixed because it is out of scope for this change and needs an author decision.

✅ **Lint** - passed

✅ No issues found.

✅ **Push** - passed

✅ No issues found.

…US-075) Add backend/rate_limiting.py: a RateLimiter ABC + env-selected factory mirroring the reranking/web_search/parsing seams. RATE_LIMITER=postgres (default) | redis; no in-memory backend (per-replica under-count + restart reset would silently disable the cost-DoS guard, ADR-0008). PostgresRateLimiter reaches counters via two service-role SECURITY DEFINER RPCs over PostgREST (same posture as US-071's resume_conversation); RedisRateLimiter is an optional scale adapter (lazy import, redis not in requirements). Both use a bounded two-bucket sliding-window counter. Migration 20260624170000_rate_limit_counters.sql adds the deny-all RLS counter table + rate_limit_hit/rate_limit_count RPCs, granted to service_role only. Ships only the seam + migration (no call site yet by design); US-076 wires it onto the widget endpoints, US-077 the per-workspace circuit breaker. Test: python -m backend.test_us075_rate_limiter (unit always-on + integration skip-clean). Verified against local Supabase: migration applies, all assertions pass, deny-all/service-role-only boundary holds at the live PostgREST layer.

…m float to int

… Redis cost

…scoped counter

… test

vercel · 2026-06-26T14:06:05Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agentic-rag	Ready	Preview, Comment	Jun 26, 2026 2:06pm

github-actions · 2026-06-26T14:12:19Z

Retrieval eval — PR vs `main`

n = 50 questions × 3 modes (vector, keyword, hybrid) on a 14-chunk corpus. PR ran in 69.59s; main in 74.3s.

Headline (each cell: PR value, Δ vs `main`)

Mode	recall@5	MRR	nDCG@5
vector	0.860 (±0.000)	0.772 (±0.000)	0.779 (±0.000)
keyword	0.110 (±0.000)	0.120 (±0.000)	0.112 (±0.000)
hybrid	0.860 (±0.000)	0.759 (±0.000)	0.769 (±0.000)

Per-category recall@5

Mode	single_chunk	multi_hop	adversarial	paraphrase
vector	0.900 (±0.000)	0.933 (±0.000)	0.600 (±0.000)	1.000 (±0.000)
keyword	0.250 (±0.000)	0.033 (±0.000)	0.000 (±0.000)	0.000 (±0.000)
hybrid	0.900 (±0.000)	0.933 (±0.000)	0.600 (±0.000)	1.000 (±0.000)

_{Comment is updated in place on each push by .github/workflows/retrieval-eval.yml (US-035). Comment-only — never blocks the build.}

hcho22 added 5 commits June 25, 2026 22:05

no-mistakes(review): tighten rate limiter window_seconds contract fro…

8dab1de

…m float to int

no-mistakes(review): scope rate-limit counters by (key,window); guard…

46a895b

… Redis cost

no-mistakes(document): sync US-075 rate-limiter docs to (key,window)-…

83aaae8

…scoped counter

no-mistakes(lint): fix mypy int/float mismatch in US-075 rate-limiter…

7575e35

… test

hcho22 merged commit 4d19b79 into main Jun 26, 2026
3 checks passed

hcho22 deleted the feat/us075-rate-limiter-seam branch June 26, 2026 14:22

hcho22 mentioned this pull request Jun 26, 2026

feat(support): per-key and per-session/IP rate limit on widget surface #42

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(support): swappable RateLimiter seam with durable Postgres default (US-075)#41

feat(support): swappable RateLimiter seam with durable Postgres default (US-075)#41
hcho22 merged 5 commits into
mainfrom
feat/us075-rate-limiter-seam

hcho22 commented Jun 26, 2026

Uh oh!

vercel Bot commented Jun 26, 2026

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hcho22 commented Jun 26, 2026

Intent

What Changed

Risk Assessment

Testing

Pipeline

Uh oh!

vercel Bot commented Jun 26, 2026

Uh oh!

github-actions Bot commented Jun 26, 2026

Retrieval eval — PR vs main

Headline (each cell: PR value, Δ vs main)

Per-category recall@5

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Retrieval eval — PR vs `main`

Headline (each cell: PR value, Δ vs `main`)