Skip to content

feat: two-server AI transport (remote multipart mode)#24

Open
LaNguAx wants to merge 3 commits into
mainfrom
feat/two-server-ai-transport-refactor
Open

feat: two-server AI transport (remote multipart mode)#24
LaNguAx wants to merge 3 commits into
mainfrom
feat/two-server-ai-transport-refactor

Conversation

@LaNguAx

@LaNguAx LaNguAx commented Jun 17, 2026

Copy link
Copy Markdown
Owner

Problem

The backend handed work to the AI service by sending absolute filesystem paths (inputPath, outputPath) to POST /process. That only works when the backend and AI run on the same machine or share a volume. The real deployment is two machines with no shared storage: an app server (frontend, backend, upload/result storage) and a separate GPU server running the FastAPI/PyTorch service. The path-based protocol cannot bridge that gap.

Solution

Introduce a selectable transport via AI_TRANSFER_MODE:

  • path (default) — unchanged path-based /process for same-machine / local dev.
  • remote — the backend uploads the video to the AI over multipart HTTP (POST /process-upload), consumes the NDJSON progress stream, then downloads the enhanced result (GET /result/:jobId) and saves it locally. Built for the two-server topology.

The browser → frontend → backend flow, storage, SSE, polling, and result streaming are all unchanged; only the internal backend↔AI hop changed.

flowchart LR
  Browser --> FE[frontend]
  FE -->|"POST /api/upload (XHR)"| BE[backend /api]
  BE -->|disk write| UP[(UPLOAD_DIR)]
  BE -->|"POST /process-upload multipart + Bearer"| AI[AI service]
  AI -->|NDJSON progress| BE
  AI -->|inference| GPU[RTX 4090]
  BE -->|"GET /result/:jobId + Bearer"| AI
  BE -->|disk write| RES[(RESULT_DIR)]
  FE -->|SSE / stream from backend| BE
Loading

Public API compatibility

No public endpoints changed. Still: POST /api/upload, GET /api/upload/status/:jobId, GET /api/upload/result/:jobId, POST /api/upload/cancel/:jobId, SSE /api/upload/events/:jobId, GET /api/upload/stream/:jobId. Shared @repo/consts / @repo/schemas / @repo/contracts are untouched. The frontend was not modified and never talks to the GPU server.

New AI internal endpoints

  • POST /process-upload (multipart jobId + video) → NDJSON stream; the completed line carries resultDownloadUrl.
  • GET /result/{jobId} → enhanced video via FileResponse from WORK_RESULT_DIR; 404 if missing; no caller-supplied path.
  • POST /cancel extended to cover both /process and /process-upload jobs.
  • GET /health and POST /process preserved.

New env vars

App Var Default Purpose
backend AI_TRANSFER_MODE path path | remote transport
backend AI_INTERNAL_TOKEN `` (empty) Bearer for internal AI calls
ai AI_INTERNAL_TOKEN `` (empty) Must match the backend
ai WORK_UPLOAD_DIR ../../storage/ai/uploads Remote-mode uploads
ai WORK_RESULT_DIR ../../storage/ai/results Remote-mode results

Two-server example values (placeholders, no real IPs in source) live in the .env.production.example files with AI_TRANSFER_MODE=remote, AI_INTERNAL_TOKEN=change-me, and MAX_FILE_SIZE_MB=20 (demo cap).

Security / token behavior

  • When AI_INTERNAL_TOKEN is set, the backend sends Authorization: Bearer <token> and the AI requires it on /process, /process-upload, /result, and /cancel (constant-time hmac.compare_digest). Empty token = no-op (local dev). /health is always open. The token is never logged or returned.
  • jobId is validated (^[A-Za-z0-9_-]{1,128}$) and upload extensions are allow-listed, so request input cannot escape the work dirs. /result resolves and confirms the file is inside WORK_RESULT_DIR.
  • The result download URL from the (untrusted) NDJSON stream is validated to be a same-origin /result/<jobId> path before fetching, preventing URL-authority injection / SSRF.

Hardened in response to a security review run on this branch (two medium findings: unauthenticated /process on an exposed GPU host, and resultDownloadUrl trust — both fixed).

Tests / checks run

  • pnpm --filter backend lint ✅, check-types ✅, build ✅, test ✅ (14 tests: env validation accepts remote / rejects invalid; AiClientService token header, NDJSON parse, result download+save, SSRF-URL rejection, cancel-with-token, 404 tolerance)
  • pnpm --filter frontend check-types ✅, build
  • pnpm check-types ✅, pnpm build
  • python -m py_compile apps/ai/server.py apps/ai/security.py ✅; python apps/ai/test_security.py ✅ (4 tests)
  • Known pre-existing (not caused by this PR): pnpm --filter frontend lint fails because eslint-plugin-react-hooks@7.0.1 can't resolve zod-validation-error/v4 under pnpm strict hoisting; pnpm format:check flags files repo-wide due to CRLF working-tree line endings on Windows (git normalizes to LF on commit). No frontend code, package.json, or lockfile was changed.

Research notes (Context7 + Tavily)

  • Node 24 multipart upload (Tavily — dev.to native-APIs migration, MDN): native FormData + await fs.openAsBlob(path) streams the file; do not set Content-Type manually (fetch sets the boundary). → streamRemoteProcess.
  • Node 24 download to disk (Tavily — MDN/blog): Readable.fromWeb(response.body) + pipeline(..., createWriteStream). → downloadResult.
  • FastAPI (Context7 /fastapi/fastapi/0.115.13): UploadFile = File() + Form() together (requires python-multipart); custom Authorization-header dependency; FileResponse(path, media_type, filename). → /process-upload, require_token, /result.

Manual verification (local two-server simulation)

Full local e2e was not run here because torch and the model checkpoint are not present on this machine. To verify on a box with the AI deps + checkpoint:

  1. AI: set AI_INTERNAL_TOKEN=dev-secret (+ WORK_UPLOAD_DIR/WORK_RESULT_DIR), run pnpm --filter ai dev (/health shows model_loaded: true).
  2. Backend: set AI_TRANSFER_MODE=remote, AI_INTERNAL_TOKEN=dev-secret, AI_SERVICE_URL=http://localhost:8000, run pnpm --filter backend dev.
  3. Frontend: pnpm --filter frontend dev; upload a small (<20MB) video.
  4. Confirm: backend POSTs /process-upload, AI streams NDJSON progress, backend downloads /result/:jobId and saves locally, the frontend streams the result, and Cancel works mid-job. Verify /process-upload returns 401 without the bearer header.

Remaining limitations

  • Jobs are still in-memory (lost on restart); no DB persistence.
  • No cleanup of old jobs/files (uploads, results, AI work dirs).
  • No shared storage between servers (the reason remote mode exists).
  • Demo upload cap of 20MB recommended.

Out of scope (later infra agents)

No Nginx/PM2/SSH/firewall/SSL/Linux-user/server-folder/production-env work — this PR is a code + docs refactor only. Concrete server addresses, the domain, and the GPU server runtime setup are left to infrastructure tooling.

LaNguAx and others added 3 commits June 17, 2026 19:24
Add an AI_TRANSFER_MODE (path | remote) so the NestJS backend can drive the
FastAPI AI service either by shared filesystem paths (path, local/dev default)
or over multipart HTTP for two-server deployments with no shared storage.

Backend:
- env.validation: add AI_TRANSFER_MODE and AI_INTERNAL_TOKEN
- extract all AI HTTP I/O into AiClientService (health, path /process, remote
  /process-upload multipart upload via fs.openAsBlob, /result download via
  Readable.fromWeb + pipeline, /cancel); typed NDJSON in ai-protocol.types
- slim ProcessingService to orchestration; download+save result in remote mode
- bearer token sent only when configured; never logged
- validate the result download URL (same-origin /result/<id>) to prevent SSRF

AI service:
- pure security helpers (security.py): job-id/extension/constant-time token
- new /process-upload (multipart) and /result/{jobId} (FileResponse, no path
  input); extend /cancel; token-guard mutating endpoints when token is set
- WORK_UPLOAD_DIR/WORK_RESULT_DIR; add python-multipart dependency

Tests: backend env + AiClientService specs (mocked fetch); AI security helper
tests. Public API and frontend UX unchanged.

Co-authored-by: Cursor <cursoragent@cursor.com>
Document the new path vs remote transport, the two-server (app + GPU server,
no shared storage) architecture, AI_TRANSFER_MODE / AI_INTERNAL_TOKEN and the
AI WORK_* dirs, the token-guarded internal endpoints (/process, /process-upload,
/result, /cancel), and the 20MB demo cap.

- README, root AGENTS.md + CLAUDE.md env table
- apps/backend/AGENTS.md, apps/ai/AGENTS.md
- backend + ai .env.development.example / .env.production.example (placeholders;
  no real IPs in source, token placeholder change-me)
- note the Context7/Tavily lookup expectation for future framework changes

Co-authored-by: Cursor <cursoragent@cursor.com>
Add docs/two-server-ai-transport.md summarizing the refactor (what/why),
env vars, endpoints, security, verification results, local manual test, and
the remaining infrastructure next steps for the app and GPU servers.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant