Context
After the production outage on March 22 and the issues with stale intelligence cards, broken odds loading, and frontend deployment gaps, I did a full 5-part system audit and built a hardening plan covering 40+ failure modes across infrastructure, backend, database, frontend, and scripts. You reviewed and approved the plan. I have now executed every item.
What to Review
Commits
- Main hardening commit (37 files, 1601 insertions): 0d21c95
- Hotfix (connection pool transaction fix): 089bf56
Detailed Review Checklist
I wrote a file-by-file review guide with specific things to validate and risks to watch for:
https://github.com/kevsilk597/cipher/blob/main/JARVIS_REVIEW_HARDENING_IMPLEMENTATION.md
Summary of Changes by Priority
P0 (Critical Fixes)
- Connection leak fix in _run_odds_inprocess() (server.py)
- rollback() before putconn() in PgConnection.close() (db.py)
- Path traversal prevention in static file serving (server.py)
- 3 bug fixes in poll_odds.py (wrong import, missing column, broken JOIN)
- Connection health check with SELECT 1 on pool checkout (db.py)
- INSERT OR REPLACE converted to proper ON CONFLICT DO UPDATE SET (db.py)
- Bounded caches with max-size eviction and TTL (server.py)
P1 (Reliability)
- Scores endpoint default date now uses Eastern time, not UTC
- Schema validation on startup for missing UNIQUE constraints
- Frontend: date-keyed card cache with 10-min TTL
- Frontend: stale the_thing cleared on error
- Frontend: top-level React Error Boundary
- Frontend: all API hooks use consistent API_BASE resolution
- Card validator import failure now loud and flagged
- V3 validation gate changed from soft to hard mode
- Health endpoint returns 503 for degraded/error status
- Logrotate config and backup script created
P2 (Hardening)
- Deploy script rewritten for systemd
- CORS restricted to known origins
- Health check expanded (frontend HTML, disk space, DB check)
- Frontend: 15-second fetch timeouts via fetchWithTimeout
- Frontend: AbortController cleanup in all useEffect hooks
- Infrastructure docs updated
- requests dependency bumped for CVE fix
- Mock mode turned off in useFeedData
What I Need
- Review both commits line by line
- Follow the checklist in JARVIS_REVIEW_HARDENING_IMPLEMENTATION.md - it flags specific risks and things to validate per file
- Test production at http://cypher.178.156.223.137.nip.io/ and http://cypher.178.156.223.137.nip.io/app/scores
- Run scripts/poll_odds.py locally to verify the odds pipeline fix
- Run scripts/health_check.sh on the server to verify monitoring
- Flag any edge cases, loopholes, or gaps not covered
The goal is 100% confidence that infrastructure, backend, and frontend are solid. If anything looks wrong, write it up here.
Context
After the production outage on March 22 and the issues with stale intelligence cards, broken odds loading, and frontend deployment gaps, I did a full 5-part system audit and built a hardening plan covering 40+ failure modes across infrastructure, backend, database, frontend, and scripts. You reviewed and approved the plan. I have now executed every item.
What to Review
Commits
Detailed Review Checklist
I wrote a file-by-file review guide with specific things to validate and risks to watch for:
https://github.com/kevsilk597/cipher/blob/main/JARVIS_REVIEW_HARDENING_IMPLEMENTATION.md
Summary of Changes by Priority
P0 (Critical Fixes)
P1 (Reliability)
P2 (Hardening)
What I Need
The goal is 100% confidence that infrastructure, backend, and frontend are solid. If anything looks wrong, write it up here.