diff --git a/docs/plans/2026-05-31-large-upload-gcs-resumable-plan.md b/docs/plans/2026-05-31-large-upload-gcs-resumable-plan.md index 3ebef94..ad22f5d 100644 --- a/docs/plans/2026-05-31-large-upload-gcs-resumable-plan.md +++ b/docs/plans/2026-05-31-large-upload-gcs-resumable-plan.md @@ -34,7 +34,7 @@ Updated 2026-06-01. Marks what has actually landed so a fresh agent can resume w - [x] **Phase 1 — Small-path fix & error clarity** (T1.1, T1.2) — merged - [x] **Phase 2 — Upload contract + state machine + schema** (T2.1–T2.3) — merged (#92) - [x] **Phase 3 — GCS resumable** (T3.1–T3.3) — merged (#93) -- [ ] **Phase 4 — Async processing (Cloud Tasks)** (T4.1–T4.3) — **next**; detailed sub-plan: [2026-06-01-phase4-cloud-tasks-impl.md](2026-06-01-phase4-cloud-tasks-impl.md) +- [x] **Phase 4 — Async processing (Cloud Tasks)** (T4.1–T4.3) — merged (PRs #99, #102, #101). Sub-plan: [2026-06-01-phase4-cloud-tasks-impl.md](2026-06-01-phase4-cloud-tasks-impl.md). **Prod activation pending** (set Cloud Run env): [2026-06-06-phase4-cloud-run-activation.md](2026-06-06-phase4-cloud-run-activation.md) - [ ] **Phase 5 — Handler registry + Tier 1 + safe ZIP** (T5.1–T5.3) - [ ] **Phase 6 — Dashboard large-upload UX** (T6.1, T6.2) - [ ] **Phase 7 — Cleanup, observability, deployment docs** diff --git a/docs/plans/2026-06-06-phase4-cloud-run-activation.md b/docs/plans/2026-06-06-phase4-cloud-run-activation.md new file mode 100644 index 0000000..4070d51 --- /dev/null +++ b/docs/plans/2026-06-06-phase4-cloud-run-activation.md @@ -0,0 +1,94 @@ +# Phase 4 — Cloud Run activation + +**Date:** 2026-06-06 · **Status:** Ready to run (maintainer step) · **Parent:** +[infra/cloud-tasks/README.md](../../infra/cloud-tasks/README.md), +[2026-06-01-phase4-cloud-tasks-impl.md](2026-06-01-phase4-cloud-tasks-impl.md) + +Phase 4 (async upload processing via Google Cloud Tasks) is merged to `main` and the +image auto-deploys to Cloud Run (Cloud Build source-deploy on push to `main`). The +feature is **inert** until four env vars are set on the service and the request timeout +is raised. Everything else — the Cloud Tasks queue, the OIDC invoker SA, the GCS bucket, +and all IAM bindings — is already provisioned by +[infra/cloud-tasks/setup.sh](../../infra/cloud-tasks/setup.sh); do **not** re-run it. + +> Identifiers below are placeholders (this is a public repo). Resolve the real values +> from gcloud at run time — the active gcloud project may differ from the target, so +> always pass `--project` explicitly. Confirm your account with +> `gcloud config get-value account`. + +## Resolve your values first + +```sh +PROJECT= # GCP project id of the deployment +REGION= # region of the Cloud Run service + queue + bucket +SERVICE= # Cloud Run service name +QUEUE= +TASKS_SA=@${PROJECT}.iam.gserviceaccount.com +SERVICE_URL=$(gcloud run services describe "$SERVICE" \ + --project="$PROJECT" --region="$REGION" --format='value(status.url)') +``` + +Optional sanity checks (the only gap is the four env vars + the timeout): + +```sh +# Already-deployed image (expect the current main commit SHA): +gcloud run services describe "$SERVICE" --project="$PROJECT" --region="$REGION" \ + --format='value(spec.template.spec.containers[0].image)' +# Queue is RUNNING: +gcloud tasks queues describe "$QUEUE" --project="$PROJECT" --location="$REGION" \ + --format='value(state)' +# Current request timeout (raise to 600 below): +gcloud run services describe "$SERVICE" --project="$PROJECT" --region="$REGION" \ + --format='value(spec.template.spec.timeoutSeconds)' +``` + +`GCS_UPLOAD_BUCKET` is expected to already be set on the service (Phase 3 storage); if +it is not, add it to the `--update-env-vars` list below. + +## Activate (the only change) + +```sh +gcloud run services update "$SERVICE" --project="$PROJECT" --region="$REGION" \ + --update-env-vars="CLOUD_TASKS_QUEUE=${QUEUE},CLOUD_TASKS_LOCATION=${REGION},CLOUD_TASKS_SERVICE_ACCOUNT=${TASKS_SA},UPLOAD_PROCESS_URL=${SERVICE_URL}/api/upload/process" \ + --timeout=600 +``` + +- `UPLOAD_PROCESS_URL` is BOTH the Cloud Tasks target base (task → `/`) + AND the OIDC audience the app verifies. It must be the live `https://…` service URL; + the app rejects an invalid URL at startup (zod `.url()`). +- All of `CLOUD_TASKS_QUEUE` + `UPLOAD_PROCESS_URL` + `CLOUD_TASKS_SERVICE_ACCOUNT` must + be set together, or the app falls back to the in-memory queue (logs a warning). +- `--update-env-vars` only adds/overwrites the listed keys; other env is untouched. +- This changes live production config (creates a new revision from the same already- + deployed image). It is outward-facing — confirm before running it. + +## Verify after (smoke test) + +See the checklist in +[2026-05-31-large-upload-gcs-resumable-plan.md](2026-05-31-large-upload-gcs-resumable-plan.md) +§13. + +1. Internal endpoint rejects non-OIDC callers — must never return 200: + + ```sh + curl -s -o /dev/null -w '%{http_code}\n' -X POST \ + "${SERVICE_URL}/api/upload/process/test-id" # expect 401 or 403 + ``` + +2. End-to-end: a large upload flows `/init` → resumable PUT to GCS → `/complete` (202) + → a task appears in the queue → `GET /api/upload/:id/status` goes + `queued → processing → completed` with a document created. Watch: + + ```sh + gcloud run services logs read "$SERVICE" --project="$PROJECT" --region="$REGION" --limit=50 + gcloud tasks queues describe "$QUEUE" --project="$PROJECT" --location="$REGION" + ``` + +## Rollback + +Reverts to the in-memory queue (processing goes inert again): + +```sh +gcloud run services update "$SERVICE" --project="$PROJECT" --region="$REGION" \ + --remove-env-vars=CLOUD_TASKS_QUEUE,CLOUD_TASKS_LOCATION,CLOUD_TASKS_SERVICE_ACCOUNT,UPLOAD_PROCESS_URL +``` diff --git a/infra/cloud-tasks/README.md b/infra/cloud-tasks/README.md index 1e59ea9..292a597 100644 --- a/infra/cloud-tasks/README.md +++ b/infra/cloud-tasks/README.md @@ -40,6 +40,11 @@ re-run. ## Configure Cloud Run (do this when the T4.1–T4.3 code is deployed) +> **The T4.1–T4.3 code is now merged and deployed.** For the remaining activation +> steps — which env vars are still missing, how to resolve the live service URL, +> and a copy-paste agent prompt — see +> [docs/plans/2026-06-06-phase4-cloud-run-activation.md](../../docs/plans/2026-06-06-phase4-cloud-run-activation.md). + `setup.sh` prints the exact values. The processing code reads these env vars; they are inert until that code ships, so set them at the same deploy: