Pre-production checklist for VERONICA control-plane deployments. Complete all items before routing live traffic.
- Docker and Docker Compose installed on the target host
- Ports 8000 (API), 5432 (PostgreSQL), 9090 (Prometheus), 3000 (Grafana), 9464 (metrics) are available
-
docker compose up -dcompletes without errors (cd deploy/) -
GET /healthreturns{"status": "ok"}with the expected version - PostgreSQL data directory is on a persistent volume (not the container ephemeral layer)
- Redis deployed if distributed budget enforcement across processes is required
(
VERONICA_REDIS_URLset and reachable from the API container) - Host has sufficient memory for PostgreSQL + Prometheus retention (minimum 2 GB recommended)
-
VERONICA_API_KEYset to a securely generated value (seedocs/key-management.md) -
VERONICA_AUTH_DISABLEDis unset (or explicitly0) -- never1in production -
VERONICA_DEBUGis unset -- never1in production -
VERONICA_CORS_ORIGINSset to an explicit origin list, not*, if the API is browser-accessible - API server is behind a reverse proxy (nginx, Caddy) with TLS termination
- Grafana admin password changed from default (
veronica) --GF_SECURITY_ADMIN_PASSWORD -
GF_AUTH_ANONYMOUS_ENABLEDdisabled if Grafana is reachable beyond localhost - PostgreSQL credentials (
POSTGRES_PASSWORD,POSTGRES_USER) changed from defaults -
.envfile excluded from version control (.gitignoreentry present) - API binds to
127.0.0.1by default; confirmVERONICA_HOSTis not set to0.0.0.0unless a reverse proxy is in front
-
pip install veronica-cp[metrics]installed (metrics extra) -
GET http://127.0.0.1:9464/metricsreturns Prometheus-formatted output - Prometheus scraping
9464/metrics-- verify in Prometheus Targets UI (/targets) - Grafana dashboard loads and shows data (open
http://127.0.0.1:3000) - Alerting rules configured for: cost ceiling breaches, HALT events, API error rate
- Log retention policy confirmed (Docker logging driver or external collector configured)
-
step_deniedmetric baseline established before go-live (should be near zero initially)
- At least one policy defined via
PUT /policies/{chain_id}before routing traffic - Initial
ceiling_usdset to at least 3x observed p95 spend (conservative start) -
on_exceedset todegradefor interactive agents;haltfor batch/autonomous agents -
step_limitconfigured where unbounded agent loops are a risk - Policy simulation run against representative historical traffic (if available)
-
GET /policiesreturns all expected chain policies with correct versions - Policy version conflict behavior tested: confirm
409 Conflicton stalecurrent_version - Gradual rollout plan documented: simulation mode -> degrade -> halt
- veronica-core kernel connected to the control-plane API
-
ShieldPipelineandBudgetEnforcerinitialized with the expected chain IDs - Event flow verified: a test LLM call produces an event visible in Grafana
- HALT path tested end-to-end: trigger a ceiling breach and confirm the agent stops
- DEGRADE path tested if
on_exceed: degradeis in use - Redis budget synchronization tested under concurrent load (if Redis is configured)
- Adapter compatibility confirmed for all LLM providers in use (OpenAI, Anthropic, etc.)
- PostgreSQL backup schedule configured (daily minimum for production)
- Backup includes policy table and event store
- Restore procedure tested on a non-production host at least once
- Policy export via
GET /policiesscripted and stored alongside infrastructure backups - Recovery time objective (RTO) documented and acceptable to stakeholders
- Smoke test: create a policy, make a test LLM call, verify spend recorded in Grafana
- On-call rotation established for HALT/DEGRADE alerts
- Escalation path documented: who to contact if the API is unreachable
- Rollback plan documented: steps to revert to previous policy versions
- Design partner contact confirmed for first 48 hours post-launch monitoring
-
VERONICA_DEBUGconfirmed absent from production environment one final time