Single-box AWS deployment: prod compose stack, Caddy edge, lifecycle scripts#32
Merged
Conversation
output: 'standalone' in next.config.ts so the image ships only .next/standalone + .next/static and runs 'node server.js' (not 'next start'). API_BASE_URL stays a runtime env var (server-only, never inlined); NODE_ENV=production baked so the session cookie gets its Secure attribute. pnpm 9 pinned explicitly — package.json deliberately has no packageManager field (web/AGENTS.md).
Build context is the repo root because api/ imports db/embedding/ retrieval/scripts.bootstrap_milvus from ../worker at runtime (the one allowed cross-package import). uv sync --frozen --no-dev from api/uv.lock keeps torch on the CPU index (no CUDA wheels). Models are NOT baked — they live in the shared hf-cache volume populated by the prewarm one-shot; runtime loads with HF_HUB_OFFLINE=1. Single uvicorn worker on purpose: each process lazily loads ~3.7GB of models. Root .dockerignore filters the api/worker contexts.
Same image serves the Celery worker and the migrate /
bootstrap-milvus one-shots (cwd must be /app/worker — the project
is non-packaged by design). pandoc + libmagic1 + libgomp1 installed;
WordNet baked at build and explicitly UNZIPPED — nltk cannot read
the corpus from the downloader's zips (verified empirically:
find('corpora/wordnet') fails against the zip, dedup.signature()
works after extraction). --concurrency=1 because each new-book
ingest loads BGE-Large up to twice (~3-4GB); --time-limit=3600 so
one poisoned upload can't pin the CPU.
Self-contained prod compose (NOT an overlay — compose merges ports:
additively, and 'only Caddy publishes a port' is the security
property this file guarantees). Postgres/Redis/etcd/MinIO/Milvus
(which has no auth)/api/web stay on the internal network; secrets
use ${VAR:?} so a missing one refuses to boot — the no-code stand-in
for the Phase 18 startup guard. Shared volumes carry the api→worker
upload handoff and the HF model cache (prewarm one-shot is the only
thing allowed to touch HuggingFace; runtime is offline).
Caddy 2.11.4 + mholt/caddy-ratelimit: per-IP limits on /api/auth/*
(10/min), /api/search-summary + /api/upload (6/min), 600/min
backstop — verified live (429 + Retry-After). default_sni is
load-bearing: browsers send no SNI for bare-IP URLs and the
handshake aborts without it (verified). Body caps nest, smallest
wins: 1MB JSON routes, 200MB global — deliberately below the api's
200MiB cap so the Next proxy never buffers a doomed upload.
Healthcheck traverses TLS→host-match→proxy→web, not just :80.
Tag-based and re-runnable: provision.sh (Ubuntu 24.04 t3a.xlarge, 100GB gp3, IMDSv2 required, SG = 443/80 world + 22 admin-IP with self-healing ingress, Elastic IP), deploy.sh (clone → on-box secret generation with 0600 perms → build → migrate → bootstrap-milvus with cold-gRPC retry → model prewarm → up → outside-in smoke: fresh user signup 201 / login 200 / HttpOnly cookie / authed library 200 / bad-login 401; @example.com because the api 422s reserved TLDs), start/stop/status (stopped box bills ~$12/mo: EBS + EIP only), destroy (typed confirmation). LLM keys forward from the operator shell via ssh stdin and are written without touching argv; branch preflight fails fast if not pushed.
Cost table (running/stopped/8h-day), what deploy.sh does, the security posture (mitigated vs accepted-v0 risks), day-2 ops, domain-later steps (SITE_HOST + drop tls internal for ACME), and deltas vs the Phase 29/30 plan.
…fecycle scripts in AGENTS.md
The remote script reaches the box via 'bash -s' reading ssh stdin; 'docker compose run' attaches the container's stdin by default, so the migrate one-shot consumed the remainder of the script off the stream — bash hit EOF and exited 0 right after migrate, skipping bootstrap-milvus/prewarm/up, and the smoke test then failed against a stack that was never started (first real deploy, curl exit 7). </dev/null on every compose run inside the heredoc.
The unquoted REMOTE delimiter command-substitutes backticks on the OPERATOR machine during expansion — a backticked 'bash -s' in a comment executed locally and hung the deploy blocking on stdin. Comment text now uses single quotes, with a warning for future editors.
busybox ssl_client sends a literal-IP SNI; default_sni only covers ABSENT SNI, so the in-container TLS healthcheck found no cert for 127.0.0.1 and unhealthy-looped on the box (masked in rehearsal where SITE_HOST was 127.0.0.1). Healthcheck now probes a loopback-only :8081 plain-HTTP listener that still traverses the proxy→web chain; fallback_sni added so real clients sending unmatched SNI get the site cert; the TLS path stays covered by the external smoke test.
…as a directive A bare-address single-site Caddyfile swallows any later site block as an unrecognized directive of the first site; caddy restart-looped on 'unrecognized directive: http://127.0.0.1:8081'. Main site now explicitly braced. (The earlier local validation masked this by piping caddy validate through tail, eating the exit code.)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Operator tooling to run v0 on one EC2 box (pre-Phase-29/30; those phases own image-build CI and k8s later):
uv sync(CPU-only torch preserved), pandoc+libmagic+WordNet baked into the worker (zips extracted — nltk can't read them zipped), Next standalone bundle (output: "standalone"added to next.config.ts; dev/CI unaffected).${VAR:?}guards refuse to boot on any missing secret (stand-in for the Phase 18 startup guard). Shared volumes: upload handoff (api→worker) + HF model cache (populated once by theprewarmone-shot; runtime isHF_HUB_OFFLINE=1).default_sniso bare-IP browsers (no SNI) can handshake.Verification
dedup.signature(), Celery task registration, alembic + bootstrap entrypoints validated in-container.library_vectorscreated, no-SNI TLS handshake with IP-SAN cert, signup 201 → login 200 → HttpOnlysg_session→ authed /library 200, anon 307, bad-login 401, HTTP→HTTPS 308, rate limiter returning 429 after the auth-zone threshold.Notes
next.config.ts. Tenant-isolation surface untouched (no new DB/Milvus queries)./search-summaryneeds GOOGLE_API_KEY or PPQ_API_KEY on the box; route 503s cleanly until set.