From 6d83c23942618f1c15a134d87e3b5e09978e8590 Mon Sep 17 00:00:00 2001 From: Jeff Green Date: Wed, 10 Jun 2026 14:10:38 -0400 Subject: [PATCH] docs(plans): granular Phase 5 + Phase 6 sub-plans --- ...6-05-31-large-upload-gcs-resumable-plan.md | 4 +- ...-06-10-phase5-handler-registry-zip-impl.md | 89 +++++++++++++++++++ ...6-06-10-phase6-dashboard-upload-ux-impl.md | 70 +++++++++++++++ 3 files changed, 161 insertions(+), 2 deletions(-) create mode 100644 docs/plans/2026-06-10-phase5-handler-registry-zip-impl.md create mode 100644 docs/plans/2026-06-10-phase6-dashboard-upload-ux-impl.md diff --git a/docs/plans/2026-05-31-large-upload-gcs-resumable-plan.md b/docs/plans/2026-05-31-large-upload-gcs-resumable-plan.md index ad22f5d..2ea4be4 100644 --- a/docs/plans/2026-05-31-large-upload-gcs-resumable-plan.md +++ b/docs/plans/2026-05-31-large-upload-gcs-resumable-plan.md @@ -35,8 +35,8 @@ Updated 2026-06-01. Marks what has actually landed so a fresh agent can resume w - [x] **Phase 2 — Upload contract + state machine + schema** (T2.1–T2.3) — merged (#92) - [x] **Phase 3 — GCS resumable** (T3.1–T3.3) — merged (#93) - [x] **Phase 4 — Async processing (Cloud Tasks)** (T4.1–T4.3) — merged (PRs #99, #102, #101). Sub-plan: [2026-06-01-phase4-cloud-tasks-impl.md](2026-06-01-phase4-cloud-tasks-impl.md). **Prod activation pending** (set Cloud Run env): [2026-06-06-phase4-cloud-run-activation.md](2026-06-06-phase4-cloud-run-activation.md) -- [ ] **Phase 5 — Handler registry + Tier 1 + safe ZIP** (T5.1–T5.3) -- [ ] **Phase 6 — Dashboard large-upload UX** (T6.1, T6.2) +- [ ] **Phase 5 — Handler registry + Tier 1 + safe ZIP** (T5.1–T5.3) — granular sub-plan: [2026-06-10-phase5-handler-registry-zip-impl.md](2026-06-10-phase5-handler-registry-zip-impl.md) +- [ ] **Phase 6 — Dashboard large-upload UX** (T6.1, T6.2) — granular sub-plan: [2026-06-10-phase6-dashboard-upload-ux-impl.md](2026-06-10-phase6-dashboard-upload-ux-impl.md) - [ ] **Phase 7 — Cleanup, observability, deployment docs** Resolved design decisions (2026-06-01): diff --git a/docs/plans/2026-06-10-phase5-handler-registry-zip-impl.md b/docs/plans/2026-06-10-phase5-handler-registry-zip-impl.md new file mode 100644 index 0000000..0e76869 --- /dev/null +++ b/docs/plans/2026-06-10-phase5-handler-registry-zip-impl.md @@ -0,0 +1,89 @@ +# Phase 5 — Handler Registry + Tier 1 + Safe ZIP · Implementation Plan + +**Date:** 2026-06-10 · **Status:** Approved scope (parent §11), granular plan not started · **Parent:** [2026-05-31-large-upload-gcs-resumable-plan.md](2026-05-31-large-upload-gcs-resumable-plan.md) §8/§8a/§9/§11 + +## For an agent with no prior context + +Phase 4 is **merged and live in prod**: a queued single-file upload streams from GCS → SHA-256 verify → extract → chunk → embed → document → `completed`. But extraction only handles **pdf/docx/txt/md** (`SUPPORTED_TYPES` in `src/services/processor.ts`), and ZIP isn't processed at all — a queued ZIP fails with `UNSUPPORTED_TYPE`. Phase 5 (a) refactors extraction into a **handler registry** and widens Tier‑1 single‑file support, and (b) adds a **safe, streaming ZIP handler** that turns one ZIP into one document per supported entry. ZIP/large uploads already reach the server via the resumable flow (`/init` accepts `application/zip`); only **processing** rejects them today. + +Read parent §8 (registry + tiers), §8a (honest support matrix), §9 (ZIP safety), §7 (config), §4 (error codes) first. + +## Decisions + +**Resolved:** + +- **Partial-success policy = `partial`** (parent §9). Archive-level failures (traversal/bomb/limits/empty) fail the whole ZIP *before* any document is created; once extraction begins, per-entry failures are recorded and the upload ends `partial` (or `completed` if all entries succeed). `partial`/`failed` are already legal `processing→` transitions (`src/db/uploads.ts`). +- **ZIP reader = `unzipper.Open.buffer()`** (already a dep; no new lib). Phase 4 already buffers the whole object in memory (bounded by the verified `size_bytes` ≤ `MAX_UPLOAD_SIZE_MB`), so random-access central-directory reading is available and gives **accurate per-entry uncompressed sizes up front** — essential for enforcing the bomb/size limits *before* decompressing. (True forward-streaming extraction without buffering is a later refinement; out of scope.) + +**Open — confirm before T5.2 (this is STOP GATE 5a):** + +- **HTML parser dep:** `html-to-text` (purpose-built text extraction, sensible defaults) vs `cheerio` (DOM, more control, heavier). **Recommendation: `html-to-text`.** Pause and confirm before adding. + +## Grounding delta — what exists now (post-Phase-4) + +- `src/services/processor.ts`: `extractText(buffer, mimetype)`, `validateFileType(buffer, mimetype)`, `isSupportedType(mimetype)`, hardcoded `SUPPORTED_TYPES` = **pdf/docx/txt/md only** (csv/xlsx/json libs present but **not wired**). These three functions are the public surface callers use: `src/api/upload.ts` (direct path fileFilter), `src/api/upload-sessions.ts` (`isAllowedUploadType`), `src/services/upload-processor.ts`. +- `src/services/upload-processor.ts`: `processUpload(uploadId)` — single-file only; resolves mimetype from `declared_mimetype`, guards `isSupportedType`, then `validateFileType` → `extractText` → document. **The ZIP branch is added here.** +- `src/services/storage/types.ts`: `createReadStream(objectKey): Readable` exists (Phase 4) — GCS + memory impls. +- `src/db/uploads.ts`: has `recordUploadProcessingResult(...)` (writes `document_ids` + counts + checksum) and `getUploadStatus` (reads `upload_entries`). **No `upload_entries` writer** — added in T5.3. +- Schema: `upload_entries` table + `upload_entries_uniq (upload_id, entry_path)` unique index already in prod (`scripts/setup-db-uploads.sql`) — the dedupe key for idempotent retries. +- Config: **no `ZIP_MAX_*` keys** in `src/utils/config.ts` — added in T5.3 (defaults from parent §7). +- Deps: `unzipper@0.12`, `xlsx`, `mammoth`, `pdf-parse`, `file-type` present; **HTML parser absent**. +- Error classes (`src/utils/errors.ts`): `UnsupportedFileTypeError` (`UNSUPPORTED_TYPE`, 400) exists; ZIP-specific codes (`ZIP_*`, `UNSUPPORTED_ENTRY`) are **new**. + +## T5.1 — File-handler registry + Tier-1 single-file types + +Refactor extraction from the hardcoded map into a registry, and wire the already-present csv/xlsx/json parsers. Behaviour for the existing 4 types must not change. + +- New `src/services/processor/types.ts`: `FileHandler { key; extensions: string[]; mimeTypes: string[]; sniff?(buf): boolean; extract(buf: Buffer): Promise }`. +- New `src/services/processor/registry.ts`: `register(h)`, `resolveByMime(mime)`, `resolveByExtension(name)`, `resolveForEntry(name, buf)` (extension + magic sniff, folding in the old `validateFileType` check), `isSupportedMime(mime)`, `supportedExtensions()`. +- New `src/services/processor/handlers/`: `text.ts` (txt/md → `buffer.toString('utf-8')`), `pdf.ts` (`pdf-parse`), `docx.ts` (`mammoth`), `csv.ts` + `xlsx.ts` (`xlsx` → sheet_to_csv/txt), `json.ts` (parse → pretty text). One `register()` call per handler in `registry.ts` (or an `index.ts` barrel). +- **Keep `src/services/processor.ts` as the stable façade:** `extractText`/`validateFileType`/`isSupportedType` delegate to the registry. Callers (`upload.ts`, `upload-sessions.ts`, `upload-processor.ts`) stay unchanged. +- **Test-first** `src/services/processor/__tests__/registry.test.ts`: resolves by extension + magic; the original 4 types still extract from fixtures; new csv/xlsx/json extract expected text; unknown type → unresolved. Add a tiny `src/services/processor/__tests__/fixtures/` dir (none today). +- `pnpm verify:fast` → **PAUSE for CodeRabbit** → commit `refactor(processor): file-handler registry + csv/xlsx/json handlers` → PR A. + +## T5.2 — HTML handler → STOP GATE 5a + +- **🛑 STOP GATE 5a:** confirm the HTML dep (`html-to-text` recommended) before adding it. +- Add the dep; new `src/services/processor/handlers/html.ts` (`.html`/`.htm`, `text/html`) → extracted text (strip scripts/styles/markup). +- Register it; it automatically becomes a valid ZIP entry type too. +- **Test-first**: fixture HTML → expected text (no tags/script content). +- `pnpm verify:fast` → **PAUSE for CodeRabbit** → commit `feat(processor): html handler` → PR B. + +## T5.3 — Safe streaming ZIP handler + per-entry results + +One ZIP → one document per supported entry, with strict safety. Archive-level failures → `failed` (no documents); per-entry failures → recorded, upload ends `partial`. + +- **Config** (`src/utils/config.ts`, parent §7 defaults): `ZIP_MAX_ENTRIES` (2000), `ZIP_MAX_COMPRESSED_BYTES` (= `MAX_UPLOAD_SIZE_MB`*MB), `ZIP_MAX_EXPANDED_BYTES` (2_000_000_000), `ZIP_MAX_ENTRY_BYTES` (50_000_000), `ZIP_MAX_COMPRESSION_RATIO` (100), `ZIP_MAX_FILENAME_LEN` (255). Document in `.env.example` + `docs/ENV.md`. +- **Errors** (`src/utils/errors.ts`): `UnsupportedEntryError` (`UNSUPPORTED_ENTRY`), `ZipPathTraversalError` (`ZIP_PATH_TRAVERSAL`), `ZipBombError` (`ZIP_BOMB`), `ZipTooManyEntriesError` (`ZIP_TOO_MANY_ENTRIES`), `ZipEntryTooLargeError` (`ZIP_ENTRY_TOO_LARGE`), `ZipNestedArchiveError` (`ZIP_NESTED_ARCHIVE`), `ZipNoSupportedEntriesError` (`ZIP_NO_SUPPORTED_ENTRIES`). Codes per parent §4. +- **DB** (`src/db/uploads.ts`): `recordUploadEntry({ uploadId, entryPath, normalizedType, sizeBytes, state, documentId?, errorCode?, errorMessage? })` — `INSERT … ON CONFLICT (upload_id, entry_path) DO UPDATE` so Cloud Task retries don't duplicate rows (mirrors the existing writers' shape). +- **New** `src/services/processor/handlers/archive-zip.ts`: `extractZip(buffer)` using `unzipper.Open.buffer(buffer)`: + - Read central directory. Enforce `ZIP_MAX_ENTRIES`, `ZIP_MAX_COMPRESSED_BYTES`, total `ZIP_MAX_EXPANDED_BYTES`, and `ZIP_MAX_COMPRESSION_RATIO` (expanded/compressed) **before decompressing** → archive-level throw. + - Per entry: skip OS junk (`__MACOSX/`, `.DS_Store`, `Thumbs.db`); reject path traversal (`../`, absolute, `C:\`, backslashes), symlinks/non-regular entries, nested archives (`.zip`/`.tar`/`.gz`/`.7z`), filename length > `ZIP_MAX_FILENAME_LEN`, uncompressed size > `ZIP_MAX_ENTRY_BYTES`. + - Yield `{ entryPath, buffer }` only for entries that resolve to a supported handler (`registry.resolveForEntry`); track skipped/unsupported. +- **Wire into** `src/services/upload-processor.ts`: after computing the ZIP's SHA-256 (existing `readAndHash`) + checksum verify, branch on ZIP mime (`application/zip`, `application/x-zip-compressed`): + - Archive-level validation throws → transition `failed` with the stable `ZIP_*` code, no documents. + - Else loop entries: each supported entry → `createDocument` + chunk/embed + `createChunks` + `onDocumentIngested` + `recordUploadEntry(state:'completed', documentId)`; per-entry extract failure → `recordUploadEntry(state:'failed', code)`. Collect `documentIds`. + - Zero supported entries → `failed` `ZIP_NO_SUPPORTED_ENTRIES`. + - Else `recordUploadProcessingResult({ documentIds, checksumComputed, entriesTotal, entriesProcessed, entriesFailed })` → transition `completed` (all ok) or `partial` (≥1 failed). +- **Test-first** `src/services/processor/__tests__/archive-zip.test.ts` + `upload-processor` ZIP cases (build ZIPs in-memory with a zip lib or committed fixtures): one doc per supported entry; mixed (good + corrupt) → `partial` + per-entry rows; traversal / bomb (ratio) / too-many / oversized-entry / nested → rejected with the right code, **no documents**; zero-supported → `ZIP_NO_SUPPORTED_ENTRIES`; **idempotent retry** (rerun) → no duplicate `upload_entries` (unique index) and no duplicate documents. +- `pnpm verify:fast` → **PAUSE for CodeRabbit** → commit `feat(processor): safe streaming ZIP handler` → `pnpm verify` → PR C. + +## New surface area summary + +- **Deps:** an HTML parser (`html-to-text` recommended) — T5.2 only. +- **Config:** `ZIP_MAX_ENTRIES`, `ZIP_MAX_COMPRESSED_BYTES`, `ZIP_MAX_EXPANDED_BYTES`, `ZIP_MAX_ENTRY_BYTES`, `ZIP_MAX_COMPRESSION_RATIO`, `ZIP_MAX_FILENAME_LEN`. +- **New files:** `src/services/processor/{types,registry}.ts`, `handlers/{text,pdf,docx,csv,xlsx,json,html,archive-zip}.ts`, fixtures dir, 3 test files. +- **Edits:** `src/services/processor.ts` (delegate to registry), `src/services/upload-processor.ts` (ZIP branch), `src/db/uploads.ts` (`recordUploadEntry`), `src/utils/errors.ts` (ZIP codes), `config.ts`, `.env.example`, `docs/ENV.md`. +- **No MCP tool-list change** → `tool-sync-check.sh` stays green. + +## PR shaping (each independently mergeable; behaviour preserved behind the registry façade) + +- **PR A** = T5.1 registry + Tier‑1 (single-file support widens to csv/xlsx/json) +- **PR B** = T5.2 HTML handler — **STOP GATE 5a** before adding the dep +- **PR C** = T5.3 safe ZIP — makes a queued ZIP produce one document per supported entry + +## Verification + +- Per slice: the named test files, then `pnpm verify:fast` before each commit, `pnpm verify` before each PR. Pause for CodeRabbit before each commit/PR. +- **End-to-end (after PR C merges + deploys):** drive the resumable API (init → GCS PUT → complete → poll) with a real ZIP of mixed supported/unsupported entries; expect `partial` with one document per supported entry and per-entry rows in the status response. (Same direct-API harness used to verify Phase 4; the dashboard can't drive it until Phase 6.) +- **Honesty:** this unblocks the *server* side of ZIP. The dashboard still can't send ZIP/large files until **Phase 6** (resumable client + accept-list + error mapping). Do not advertise ZIP in the UI before Phase 6. diff --git a/docs/plans/2026-06-10-phase6-dashboard-upload-ux-impl.md b/docs/plans/2026-06-10-phase6-dashboard-upload-ux-impl.md new file mode 100644 index 0000000..5754038 --- /dev/null +++ b/docs/plans/2026-06-10-phase6-dashboard-upload-ux-impl.md @@ -0,0 +1,70 @@ +# Phase 6 — Dashboard Large-Upload UX · Implementation Plan + +**Date:** 2026-06-10 · **Status:** Approved scope (parent §11), granular plan not started · **Parent:** [2026-05-31-large-upload-gcs-resumable-plan.md](2026-05-31-large-upload-gcs-resumable-plan.md) §8a/§10/§11 + +## For an agent with no prior context + +The server's large-upload flow is live (Phase 4: `/init` → browser PUT to GCS → `/complete` → async Cloud Tasks processing → `/status`). But the **dashboard never uses it** — `dashboard/app/upload/page.tsx` still POSTs the whole file to the legacy direct `/api/upload` with `FormData` and **fake progress** (sets 30, then 100). So large files and ZIPs fail in the browser, and errors render as `Upload failed: [object Object]`. Phase 6 makes the dashboard drive the resumable flow for large files, shows **real progress + processing status**, and replaces the over-promising accept list and broken error rendering with honest, readable ones. + +Read parent §10 (frontend behaviour) and §8a (support matrix) first. + +## Sequencing + +Phase 6 depends on **Phase 5 (server ZIP + widened Tier‑1)** for its accept-list to honestly include ZIP/csv/xlsx/json/html. If Phase 5 is not yet merged+deployed, ship Phase 6 with the accept-list limited to the **currently** processable types (pdf/docx/txt/md) and add the rest when Phase 5 lands. The resumable *plumbing* (T6.1) is independent of Phase 5. + +## Decisions + +**Resolved:** + +- **The page is self-contained.** `dashboard/components/upload-zone.tsx` is **not imported anywhere** (dead) — ignore the parent §11/§12 reference to it. All UI work is in `dashboard/app/upload/page.tsx`; all client API work in `dashboard/lib/api.ts`. +- **Root cause of `[object Object]`:** `page.tsx:119` does `body.error || body.message`, but the server returns `{ error: { message, code, statusCode } }` (nested). `body.error` is an object → stringifies to `[object Object]`. The fix reads `body.error.message` / `body.error.code`. + +**Open — confirm before T6.1:** + +- **Resumable PUT strategy:** **chunked** (e.g. 8 MiB chunks, 256 KiB-aligned, honor GCS `308` + `Range`, in-session resume on transient failure, byte progress per chunk) **vs single-shot PUT** with `XMLHttpRequest` `upload.onprogress`. **Recommendation: chunked** — the original failure was a ~60 MB upload dying mid-flight, which single-shot can't resume. ⚠️ **Prerequisite:** bucket CORS must allow `PUT` + `Content-Range` from the dashboard origin **and expose the `Range` response header** (documented in Phase 3 T3.3 — verify it's actually applied to the bucket before building chunked resume). +- **Threshold source:** a client constant `NEXT_PUBLIC_UPLOAD_THRESHOLD_MB` (default 20, matching server `MAX_SINGLE_FILE_SIZE_MB`) decides direct-vs-resumable *before* any request. **Recommendation: this.** Alternative — always-resumable (simpler, one code path) but loses the snappy synchronous small-file response. + +## Grounding delta — what exists now + +- `dashboard/app/upload/page.tsx`: self-contained drop zone + per-file state (`pending|uploading|complete|error`) + fake progress; direct `fetch(${apiBase}/upload, FormData)` at line 109; over-promising copy "PDF, DOCX, TXT, MD, HTML, images, audio, MBOX, EML, ZIP" at line 188; file input has **no `accept`**; broken error read at line 119; a working "No server configured" guard at lines 76–95 (keep it). +- `dashboard/lib/api.ts`: `getApiBase()` (respects `localStorage['textrawl_server']` override + `NEXT_PUBLIC_API_URL`), `getServerBase()`, `getWsBase()`, `uploadFile()` (direct path), bearer token from `localStorage['textrawl_token']`. **No resumable client functions.** +- Server contract (from `src/api/upload-sessions.ts`, all bearer-authed except the GCS PUT): + - `POST /api/upload/init { filename, contentType, size, checksum?, checksumAlgo? }` → `{ uploadId, objectKey, bucket, resumableUri, expiresAt, state, useDirectUpload }` + - browser `PUT` bytes → `resumableUri` (GCS; `Content-Range`; **no** bearer header) + - `POST /api/upload/complete { uploadId, checksum? }` → `202 { uploadId, state:'queued', statusUrl }` + - `GET /api/upload/:uploadId/status` → `{ state, filename, size, progress:{entriesTotal,entriesProcessed,entriesFailed}, documentIds, entries:[{name,state,documentId,code}], error:{code,message}|null, … }` + - `DELETE /api/upload/:uploadId` → cancel (idempotent; rejects once processing/terminal) + +## T6.1 — Resumable client + threshold switch + real progress + +- **Client API** (`dashboard/lib/api.ts`): add `initUpload(file, opts)`, `putResumable(resumableUri, file, onProgress)` (chunked; resolves committed offset via `bytes */total` probe on resume), `completeUpload(uploadId, checksum?)`, `getUploadStatus(uploadId)`, `cancelUpload(uploadId)`. Reuse `getApiBase()`/token headers; the GCS PUT sends **no** bearer header. +- **Page flow** (`page.tsx`): per file, if `size ≤ NEXT_PUBLIC_UPLOAD_THRESHOLD_MB` → existing direct `uploadFile`; else **resumable**: `init` → `putResumable` (byte progress drives the bar) → `complete` → **poll `getUploadStatus`** until terminal. Extend the per-file state to `pending → uploading → processing → complete | partial | error`, with a **Retry** control on transient upload errors and a **Cancel** that calls `cancelUpload`. Replace the fake `30/100` with real bytes (upload) then `entriesProcessed/entriesTotal` or a processing spinner (processing). +- **Config:** add `NEXT_PUBLIC_UPLOAD_THRESHOLD_MB` to `dashboard/.env.example` (or document it) + a default constant. +- **Test-first** (dashboard test setup — confirm vitest/RTL config exists; add if missing): mock `lib/api.ts`; small file → direct path; large file → init/put/complete/poll sequence with progress transitions; poll resolves `completed`; a `partial` status surfaces per-entry failures; cancel calls `cancelUpload`. +- `pnpm verify:fast` → **PAUSE for CodeRabbit** → commit `feat(dashboard): resumable large-upload flow with real progress` → PR A. + +## T6.2 — Honest support matrix + clear error mapping + +- **Accept list + copy** (`page.tsx`): set the file input `accept` and the help copy from §8a **"Supported now"** (with Phase 5 deployed: `.txt,.md,.pdf,.docx,.csv,.xlsx,.json,.html,.htm,.zip`; ZIP labeled "ZIP containing supported document/text files"). Remove images/audio/MBOX/EML from the copy. Source the list from a single constant so it can't drift. +- **Error mapping** (`lib/api.ts` + `page.tsx`): a `parseApiError(res)` helper that reads `{ error: { code, message } }` and maps stable codes → friendly text — `FILE_TOO_LARGE` ("File exceeds the maximum size"), `UNSUPPORTED_TYPE`/`UNSUPPORTED_ENTRY`, `SIZE_MISMATCH`/`OBJECT_NOT_FOUND` ("Upload didn't finish — retry"), `CHECKSUM_MISMATCH`, `UPLOAD_EXPIRED`, `ZIP_*` (traversal/bomb/too-many/oversized/nested → clear messages), plus a network fallback ("Couldn't reach the server"). **Eliminates `[object Object]` and bare `Load failed`.** +- **ZIP MIME normalization:** browsers report ZIP as `application/zip`, `application/x-zip-compressed`, or `''` — normalize client-side (by extension) before `init` so `contentType` is consistent. +- **Test-first:** `parseApiError` maps each code to the expected string; nested-error body no longer yields `[object Object]`; the accept constant matches the §8a "supported now" set. +- `pnpm verify:fast` → **PAUSE for CodeRabbit** → commit `feat(dashboard): honest support matrix + clear upload errors` → `pnpm verify` → PR B. + +## New surface area summary + +- **Config:** `NEXT_PUBLIC_UPLOAD_THRESHOLD_MB` (dashboard). +- **Edits:** `dashboard/lib/api.ts` (resumable client + `parseApiError`), `dashboard/app/upload/page.tsx` (threshold switch, real progress, status polling, retry/cancel, accept list, error text, ZIP MIME). Possibly delete the dead `dashboard/components/upload-zone.tsx`. +- **No server changes** — Phase 6 is dashboard-only; it consumes the existing Phase 3/4 contract. +- **Infra prerequisite to verify (not code):** bucket CORS allows browser `PUT` + `Content-Range` from the dashboard origin and exposes `Range` (Phase 3 T3.3). + +## PR shaping + +- **PR A** = T6.1 resumable client + threshold + real progress +- **PR B** = T6.2 honest accept-list + clear errors (depends on A) + +## Verification + +- Per slice: dashboard tests, then `pnpm verify:fast` / `pnpm verify`; pause for CodeRabbit. +- **Manual E2E (the real proof):** from the deployed dashboard, upload (a) a small `.txt` → direct path still works; (b) a **>threshold** `.pdf` → resumable bar advances by real bytes → processing → "complete"; (c) with Phase 5 live, a **ZIP** of mixed entries → ends "partial" with per-entry results; (d) an unsupported/oversized file → a **readable** error (never `[object Object]` or `Load failed`). Watch `gcloud run services logs read textrawl …` + the Cloud Tasks queue during (b)/(c). +- **CORS check first:** confirm a browser `PUT` with `Content-Range` to the resumable URI succeeds cross-origin from the dashboard origin before relying on chunked resume.