You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nearly every service and infrastructure dependency is wrapped in @lru_cache. This creates several production risks:
No graceful restart on config change — if Redis or Postgres connection details change (e.g., failover, credential rotation), the process must be restarted. There's no mechanism to invalidate the cache.
Redis client is cached forever — if the connection breaks and the client enters a bad state, it won't be recreated. asyncpg has pool_pre_ping but Redis doesn't get equivalent treatment.
get_settings() is cached — environment variable changes (for sidecar-injected secrets) won't take effect without process restart.
Refresh Token Stored as SHA-256 Hash — Vulnerable to Database Leaks
Files: sessions.py:560-562
Refresh tokens are hashed with plain sha256. Unlike bcrypt for passwords, SHA-256 is fast — if the sessions table is leaked, an attacker could brute-force refresh tokens (they're JWTs with known structure, making partial preimage attacks feasible). Consider HMAC with a server-side secret key, or at minimum use a keyed hash.
Email HTML Injection in Verification and Reset Emails
The engine is created with default pool settings (pool_size=5, max_overflow=10). For a centralized auth service handling traffic from multiple downstream services, this will bottleneck under production load. There's no configuration for:
pool_size / max_overflow
pool_timeout
pool_recycle (important for cloud databases that silently kill idle connections)
BaseHTTPMiddleware for Rate Limiting is a Known Anti-Pattern
File: rate_limit.py:69
Starlette's BaseHTTPMiddleware wraps each request in a StreamingResponse and holds the entire request body in memory. For an auth service this is somewhat tolerable (small payloads), but it also prevents proper response streaming and has known issues with background tasks. All your middleware uses this pattern. Consider migrating at least the rate limiter to a pure ASGI middleware.
Rate Limiting is Per-Path, Per-IP Only — No Per-User Rate Limiting
File: rate_limit.py:168-171
Rate limiting keys are rate_limit:{path}:{client_ip}. This means:
A single corporate NAT can be rate-limited across all its users
An attacker behind many IPs (botnet) bypasses the rate limit entirely
Authenticated endpoints have no per-user rate limits for token refresh, API key operations, etc.
The brute force service partially compensates for login, but there's no per-user limiting on /auth/token or admin endpoints.
_issue_token_pair Uses inspect.signature on Every Call
File: auth.py:104-155
This function uses inspect.signature() on every token issuance to support "legacy test doubles". In production, this is pure overhead. The signature won't change at runtime. Cache the inspection result or remove the legacy shim.
The fallback branch has a TOCTOU race — two concurrent callbacks with the same state could both read the value before either deletes it, allowing a replay. The getdel path is correct, but the fallback should use a Lua script or pipeline for atomicity.
UserService is Re-instantiated Per Request (No Caching)
This creates a new CryptContext on every request. While passlib.CryptContext is lightweight, it's still unnecessary object churn at scale compared to a cached singleton.
No Graceful Shutdown / Lifespan Hook
File: main.py:18-63
There's no FastAPI lifespan handler to:
Close the SQLAlchemy engine pool on shutdown (there's a dispose_engine() but it's never called)
Close the Redis connection pool
Drain the webhook queue gracefully
This will cause connection leak warnings and potentially lost webhook deliveries during rolling deployments.
Audit Events Are Written in the Same Transaction as Business Logic
Files: auth.py:432-478, lifecycle.py:100-143
Audit records are awaited in the same request path as login/signup. If the audit service or DB is slow, it directly impacts authentication latency. Audit writes should be fire-and-forget or queued asynchronously (similar to how webhook emission already uses a separate session factory).
Missing Index on sessions.hashed_refresh_token
File: sessions.py:357-371
_fetch_session_by_refresh_hash queries Session.hashed_refresh_token with WITH FOR UPDATE, but there's no visible index on this column. On every token refresh, this becomes a sequential scan against the sessions table, which grows continuously. This will become a performance cliff as the user base scales.
Webhook SSRF DNS Rebinding Not Fully Mitigated
File: webhook_service.py:161-205
The DNS resolution happens at registration time and again at delivery time (_resolve_target), which is good. However, between resolution and the actual HTTP request (httpx.post), DNS could rebind to an internal IP. The host=connect_host substitution partially mitigates this, but httpx may still perform its own DNS resolution depending on transport configuration. Consider pinning via httpx.AsyncClient(transport=...) with explicit address binding.
Email Column Has unique=True, but Soft Deletes Create Ambiguity
File: user.py:42
The email column has a hard UNIQUE constraint, but soft-deleted users retain their email. This means:
A user who erases their account permanently blocks that email from re-registration
The composite index ix_users_email_deleted_at exists but the UNIQUE constraint doesn't leverage it
A 404 or 405 returning invalid_token is semantically wrong and will confuse clients. A 503 returning session_expired masks infrastructure failures. These defaults should be more accurate codes.
Internal server errors are tagged with invalid_token, which will mislead SDK consumers into treating infrastructure failures as authentication failures and triggering re-auth loops.
No Cookie max-age Set on Auth Cookies
File: browser_sessions.py:207-232
set_cookie() doesn't include a max_age parameter. Without it, cookies become session cookies (deleted when browser closes). This conflicts with the 7-day refresh token TTL — users will lose their session on browser close even though the refresh token is still valid.
Each container (auth-service, webhook-worker, webhook-scheduler) independently generates RSA keys if not provided. In a fresh docker compose up, each service will have different JWT keys, meaning tokens issued by the auth service won't verify on the worker and vice versa. The key generation should be centralized or the keys should be shared via a volume/secret.
No Mechanism for Session Listing / Active Session Management
Users have no way to:
List their active sessions
Revoke a specific session (only all sessions via password reset)
See where they're logged in from
This is a standard feature expectation for production auth systems.
Code Review Issues
lru_cache on Singletons Prevents Configuration Reload and Causes Stale State
Files:
config.py:346,jwt.py:203,sessions.py:577,signing_keys.py:261,token_service.py:131,lifecycle_service.py:764,otp_service.py:855,brute_force_service.py:284,webhook_service.py:825-853,rate_limit.py:62Nearly every service and infrastructure dependency is wrapped in
@lru_cache. This creates several production risks:pool_pre_pingbut Redis doesn't get equivalent treatment.get_settings()is cached — environment variable changes (for sidecar-injected secrets) won't take effect without process restart.Refresh Token Stored as SHA-256 Hash — Vulnerable to Database Leaks
Files:
sessions.py:560-562Refresh tokens are hashed with plain sha256. Unlike bcrypt for passwords, SHA-256 is fast — if the sessions table is leaked, an attacker could brute-force refresh tokens (they're JWTs with known structure, making partial preimage attacks feasible). Consider HMAC with a server-side secret key, or at minimum use a keyed hash.
Email HTML Injection in Verification and Reset Emails
Files:
lifecycle_service.py:107-109,lifecycle_service.py:119-122The
verification_linkis interpolated directly into HTML without escaping:f'<p><a href="{verification_link}">{verification_link}</a></p>'While the link is server-generated, a JWT token value containing
"could break out of the attribute. Use proper HTML escaping.No Database Connection Pool Tuning
File:
session.py:22The engine is created with default pool settings (
pool_size=5,max_overflow=10). For a centralized auth service handling traffic from multiple downstream services, this will bottleneck under production load. There's no configuration for:pool_size/max_overflowpool_timeoutpool_recycle(important for cloud databases that silently kill idle connections)BaseHTTPMiddleware for Rate Limiting is a Known Anti-Pattern
File:
rate_limit.py:69Starlette's
BaseHTTPMiddlewarewraps each request in aStreamingResponseand holds the entire request body in memory. For an auth service this is somewhat tolerable (small payloads), but it also prevents proper response streaming and has known issues with background tasks. All your middleware uses this pattern. Consider migrating at least the rate limiter to a pure ASGI middleware.Rate Limiting is Per-Path, Per-IP Only — No Per-User Rate Limiting
File:
rate_limit.py:168-171Rate limiting keys are
rate_limit:{path}:{client_ip}. This means:/auth/tokenor admin endpoints._issue_token_pair Uses inspect.signature on Every Call
File:
auth.py:104-155This function uses
inspect.signature()on every token issuance to support "legacy test doubles". In production, this is pure overhead. The signature won't change at runtime. Cache the inspection result or remove the legacy shim.OAuth State Not Atomic (get + delete Race)
File:
oauth_service.py:385-390The fallback branch has a TOCTOU race — two concurrent callbacks with the same state could both read the value before either deletes it, allowing a replay. The
getdelpath is correct, but the fallback should use a Lua script or pipeline for atomicity.UserService is Re-instantiated Per Request (No Caching)
File:
auth.py:71-72This creates a new
CryptContexton every request. Whilepasslib.CryptContextis lightweight, it's still unnecessary object churn at scale compared to a cached singleton.No Graceful Shutdown / Lifespan Hook
File:
main.py:18-63There's no FastAPI lifespan handler to:
dispose_engine()but it's never called)This will cause connection leak warnings and potentially lost webhook deliveries during rolling deployments.
Audit Events Are Written in the Same Transaction as Business Logic
Files:
auth.py:432-478,lifecycle.py:100-143Audit records are awaited in the same request path as login/signup. If the audit service or DB is slow, it directly impacts authentication latency. Audit writes should be fire-and-forget or queued asynchronously (similar to how webhook emission already uses a separate session factory).
Missing Index on sessions.hashed_refresh_token
File:
sessions.py:357-371_fetch_session_by_refresh_hashqueriesSession.hashed_refresh_tokenwithWITH FOR UPDATE, but there's no visible index on this column. On every token refresh, this becomes a sequential scan against the sessions table, which grows continuously. This will become a performance cliff as the user base scales.Webhook SSRF DNS Rebinding Not Fully Mitigated
File:
webhook_service.py:161-205The DNS resolution happens at registration time and again at delivery time (
_resolve_target), which is good. However, between resolution and the actual HTTP request (httpx.post), DNS could rebind to an internal IP. Thehost=connect_hostsubstitution partially mitigates this, but httpx may still perform its own DNS resolution depending on transport configuration. Consider pinning viahttpx.AsyncClient(transport=...)with explicit address binding.Email Column Has unique=True, but Soft Deletes Create Ambiguity
File:
user.py:42The email column has a hard
UNIQUEconstraint, but soft-deleted users retain their email. This means:ix_users_email_deleted_atexists but theUNIQUEconstraint doesn't leverage itError Code Defaults Are Misleading
File:
error_handlers.py:48-57A 404 or 405 returning
invalid_tokenis semantically wrong and will confuse clients. A 503 returningsession_expiredmasks infrastructure failures. These defaults should be more accurate codes.500 Errors Return code: "invalid_token"
File:
error_handlers.py:191Internal server errors are tagged with
invalid_token, which will mislead SDK consumers into treating infrastructure failures as authentication failures and triggering re-auth loops.No Cookie max-age Set on Auth Cookies
File:
browser_sessions.py:207-232set_cookie()doesn't include amax_ageparameter. Without it, cookies become session cookies (deleted when browser closes). This conflicts with the 7-day refresh token TTL — users will lose their session on browser close even though the refresh token is still valid.Docker Compose Services Generate Independent JWT Keys
File:
docker-compose.yml:111-119Each container (
auth-service,webhook-worker,webhook-scheduler) independently generates RSA keys if not provided. In a freshdocker compose up, each service will have different JWT keys, meaning tokens issued by the auth service won't verify on the worker and vice versa. The key generation should be centralized or the keys should be shared via a volume/secret.No Mechanism for Session Listing / Active Session Management
Users have no way to:
This is a standard feature expectation for production auth systems.