feat: two-server AI transport (remote multipart mode) by LaNguAx · Pull Request #24 · LaNguAx/upscale-ai

LaNguAx · 2026-06-17T16:41:26Z

Problem

The backend handed work to the AI service by sending absolute filesystem paths (inputPath, outputPath) to POST /process. That only works when the backend and AI run on the same machine or share a volume. The real deployment is two machines with no shared storage: an app server (frontend, backend, upload/result storage) and a separate GPU server running the FastAPI/PyTorch service. The path-based protocol cannot bridge that gap.

Solution

Introduce a selectable transport via AI_TRANSFER_MODE:

path (default) — unchanged path-based /process for same-machine / local dev.
remote — the backend uploads the video to the AI over multipart HTTP (POST /process-upload), consumes the NDJSON progress stream, then downloads the enhanced result (GET /result/:jobId) and saves it locally. Built for the two-server topology.

The browser → frontend → backend flow, storage, SSE, polling, and result streaming are all unchanged; only the internal backend↔AI hop changed.

flowchart LR
  Browser --> FE[frontend]
  FE -->|"POST /api/upload (XHR)"| BE[backend /api]
  BE -->|disk write| UP[(UPLOAD_DIR)]
  BE -->|"POST /process-upload multipart + Bearer"| AI[AI service]
  AI -->|NDJSON progress| BE
  AI -->|inference| GPU[RTX 4090]
  BE -->|"GET /result/:jobId + Bearer"| AI
  BE -->|disk write| RES[(RESULT_DIR)]
  FE -->|SSE / stream from backend| BE

Public API compatibility

No public endpoints changed. Still: POST /api/upload, GET /api/upload/status/:jobId, GET /api/upload/result/:jobId, POST /api/upload/cancel/:jobId, SSE /api/upload/events/:jobId, GET /api/upload/stream/:jobId. Shared @repo/consts / @repo/schemas / @repo/contracts are untouched. The frontend was not modified and never talks to the GPU server.

New AI internal endpoints

POST /process-upload (multipart jobId + video) → NDJSON stream; the completed line carries resultDownloadUrl.
GET /result/{jobId} → enhanced video via FileResponse from WORK_RESULT_DIR; 404 if missing; no caller-supplied path.
POST /cancel extended to cover both /process and /process-upload jobs.
GET /health and POST /process preserved.

New env vars

App	Var	Default	Purpose
backend	`AI_TRANSFER_MODE`	`path`	`path` \| `remote` transport
backend	`AI_INTERNAL_TOKEN`	`` (empty)	Bearer for internal AI calls
ai	`AI_INTERNAL_TOKEN`	`` (empty)	Must match the backend
ai	`WORK_UPLOAD_DIR`	`../../storage/ai/uploads`	Remote-mode uploads
ai	`WORK_RESULT_DIR`	`../../storage/ai/results`	Remote-mode results

Two-server example values (placeholders, no real IPs in source) live in the .env.production.example files with AI_TRANSFER_MODE=remote, AI_INTERNAL_TOKEN=change-me, and MAX_FILE_SIZE_MB=20 (demo cap).

Security / token behavior

When AI_INTERNAL_TOKEN is set, the backend sends Authorization: Bearer <token> and the AI requires it on /process, /process-upload, /result, and /cancel (constant-time hmac.compare_digest). Empty token = no-op (local dev). /health is always open. The token is never logged or returned.
jobId is validated (^[A-Za-z0-9_-]{1,128}$) and upload extensions are allow-listed, so request input cannot escape the work dirs. /result resolves and confirms the file is inside WORK_RESULT_DIR.
The result download URL from the (untrusted) NDJSON stream is validated to be a same-origin /result/<jobId> path before fetching, preventing URL-authority injection / SSRF.

Hardened in response to a security review run on this branch (two medium findings: unauthenticated /process on an exposed GPU host, and resultDownloadUrl trust — both fixed).

Tests / checks run

pnpm --filter backend lint ✅, check-types ✅, build ✅, test ✅ (14 tests: env validation accepts remote / rejects invalid; AiClientService token header, NDJSON parse, result download+save, SSRF-URL rejection, cancel-with-token, 404 tolerance)
pnpm --filter frontend check-types ✅, build ✅
pnpm check-types ✅, pnpm build ✅
python -m py_compile apps/ai/server.py apps/ai/security.py ✅; python apps/ai/test_security.py ✅ (4 tests)
Known pre-existing (not caused by this PR): pnpm --filter frontend lint fails because eslint-plugin-react-hooks@7.0.1 can't resolve zod-validation-error/v4 under pnpm strict hoisting; pnpm format:check flags files repo-wide due to CRLF working-tree line endings on Windows (git normalizes to LF on commit). No frontend code, package.json, or lockfile was changed.

Research notes (Context7 + Tavily)

Node 24 multipart upload (Tavily — dev.to native-APIs migration, MDN): native FormData + await fs.openAsBlob(path) streams the file; do not set Content-Type manually (fetch sets the boundary). → streamRemoteProcess.
Node 24 download to disk (Tavily — MDN/blog): Readable.fromWeb(response.body) + pipeline(..., createWriteStream). → downloadResult.
FastAPI (Context7 /fastapi/fastapi/0.115.13): UploadFile = File() + Form() together (requires python-multipart); custom Authorization-header dependency; FileResponse(path, media_type, filename). → /process-upload, require_token, /result.

Manual verification (local two-server simulation)

Full local e2e was not run here because torch and the model checkpoint are not present on this machine. To verify on a box with the AI deps + checkpoint:

AI: set AI_INTERNAL_TOKEN=dev-secret (+ WORK_UPLOAD_DIR/WORK_RESULT_DIR), run pnpm --filter ai dev (/health shows model_loaded: true).
Backend: set AI_TRANSFER_MODE=remote, AI_INTERNAL_TOKEN=dev-secret, AI_SERVICE_URL=http://localhost:8000, run pnpm --filter backend dev.
Frontend: pnpm --filter frontend dev; upload a small (<20MB) video.
Confirm: backend POSTs /process-upload, AI streams NDJSON progress, backend downloads /result/:jobId and saves locally, the frontend streams the result, and Cancel works mid-job. Verify /process-upload returns 401 without the bearer header.

Remaining limitations

Jobs are still in-memory (lost on restart); no DB persistence.
No cleanup of old jobs/files (uploads, results, AI work dirs).
No shared storage between servers (the reason remote mode exists).
Demo upload cap of 20MB recommended.

Out of scope (later infra agents)

No Nginx/PM2/SSH/firewall/SSL/Linux-user/server-folder/production-env work — this PR is a code + docs refactor only. Concrete server addresses, the domain, and the GPU server runtime setup are left to infrastructure tooling.

Add an AI_TRANSFER_MODE (path | remote) so the NestJS backend can drive the FastAPI AI service either by shared filesystem paths (path, local/dev default) or over multipart HTTP for two-server deployments with no shared storage. Backend: - env.validation: add AI_TRANSFER_MODE and AI_INTERNAL_TOKEN - extract all AI HTTP I/O into AiClientService (health, path /process, remote /process-upload multipart upload via fs.openAsBlob, /result download via Readable.fromWeb + pipeline, /cancel); typed NDJSON in ai-protocol.types - slim ProcessingService to orchestration; download+save result in remote mode - bearer token sent only when configured; never logged - validate the result download URL (same-origin /result/<id>) to prevent SSRF AI service: - pure security helpers (security.py): job-id/extension/constant-time token - new /process-upload (multipart) and /result/{jobId} (FileResponse, no path input); extend /cancel; token-guard mutating endpoints when token is set - WORK_UPLOAD_DIR/WORK_RESULT_DIR; add python-multipart dependency Tests: backend env + AiClientService specs (mocked fetch); AI security helper tests. Public API and frontend UX unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>

Document the new path vs remote transport, the two-server (app + GPU server, no shared storage) architecture, AI_TRANSFER_MODE / AI_INTERNAL_TOKEN and the AI WORK_* dirs, the token-guarded internal endpoints (/process, /process-upload, /result, /cancel), and the 20MB demo cap. - README, root AGENTS.md + CLAUDE.md env table - apps/backend/AGENTS.md, apps/ai/AGENTS.md - backend + ai .env.development.example / .env.production.example (placeholders; no real IPs in source, token placeholder change-me) - note the Context7/Tavily lookup expectation for future framework changes Co-authored-by: Cursor <cursoragent@cursor.com>

Add docs/two-server-ai-transport.md summarizing the refactor (what/why), env vars, endpoints, security, verification results, local manual test, and the remaining infrastructure next steps for the app and GPU servers. Co-authored-by: Cursor <cursoragent@cursor.com>

LaNguAx and others added 3 commits June 17, 2026 19:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: two-server AI transport (remote multipart mode)#24

feat: two-server AI transport (remote multipart mode)#24
LaNguAx wants to merge 3 commits into
mainfrom
feat/two-server-ai-transport-refactor

LaNguAx commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LaNguAx commented Jun 17, 2026

Problem

Solution

Public API compatibility

New AI internal endpoints

New env vars

Security / token behavior

Tests / checks run

Research notes (Context7 + Tavily)

Manual verification (local two-server simulation)

Remaining limitations

Out of scope (later infra agents)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant