Skip to content

Auto-deploy to Fly + single-user auth#23

Open
qtpi-bonding wants to merge 26 commits into
raroque:mainfrom
qtpi-bonding:feat/auto-deploy-pr
Open

Auto-deploy to Fly + single-user auth#23
qtpi-bonding wants to merge 26 commits into
raroque:mainfrom
qtpi-bonding:feat/auto-deploy-pr

Conversation

@qtpi-bonding
Copy link
Copy Markdown

Description

This PR adds a turnkey production deploy path (Fly.io) and the auth layer that makes the deploy safe to host on a public URL.

It's a single coupled PR by design — a deploy-only PR ships a knowingly-vulnerable URL, and an auth-only PR has no real consumer until deploy ships. The two halves answer the same question ("how do I host this without anyone hitting the admin endpoints?") and only make sense together. Happy to split further if you'd rather see them separately.

Posture: I'm submitting this as a draft for your eyes. Happy to engage on review feedback, let me know what you'd like changed and I'll iterate.

What's in here

Deploy infrastructure

  • Dockerfile — multi-stage node:22-slim build (Debian for SSH-friendliness, runs server with tsx to match npm start)
  • .dockerignore
  • fly.toml — single machine, always-on (min_machines_running = 1, auto_stop_machines = false) because the in-process loops require single-replica
  • .github/workflows/deploy.yml — test → convex deploy → bootstrap → fly deploy → smoke
  • scripts/deploy.ts — interactive npm run deploy mirroring your setup.ts patterns. Helpers (banner, runCapture, etc.) are duplicated from setup.ts for now; happy to follow up with a dedupe PR if you want.
  • docs/deploying.md + a one-line link from the README

Auth layer (Convex Auth, single-user password provider)

  • convex/auth.config.ts, convex/auth.tsconvexAuth({ providers: [Password] }) + a requireUser helper
  • convex/users.ts — idempotent bootstrap action (CI-triggered) and setPassword action (rotation), both using createAccount from @convex-dev/auth/server
  • All existing Convex functions classified as either internalQuery/internalMutation/internalAction (server-only) or query/mutation with await requireUser(ctx) (browser-callable)
  • Internal twins (agents.listInternal, messages.recentInternal, etc.) for the 10 functions called from BOTH server and browser — server uses the internal twin (deploy-key path, no JWT), browser uses the public version with auth check

Express edge

  • server/auth.tsverifyHmac (timing-safe) and requireAdmin (JWT verification via Convex's JWKS endpoint using jose)
  • server/index.ts — global requireAdmin middleware with allowlist for /health and /sendblue/webhook; static debug UI served from debug/dist in production; WebSocket /ws upgrade auth via ?token=<jwt> query param
  • server/sendblue.ts — HMAC signature verification + from_number whitelist on the inbound webhook (preserves all existing dedup/broadcast/handler logic underneath)

Debug UI

  • Wrapped in <ConvexAuthProvider>
  • debug/src/auth.tsx — single-password login form using useAuthActions().signIn("password", ...)
  • debug/src/api-client.tsuseApiClient() hook that attaches the JWT to outbound fetch calls
  • 7 existing fetch call sites in ConsolidationPanel + ComposioSection migrated to use the authed wrapper

Tests (15 new unit tests, no new test framework — uses Node 22's built-in node:test)

  • verifyHmac: 5 cases
  • requireAdmin: 6 cases
  • Sendblue webhook auth: 4 cases (real Express + raw HTTP)

CONTRIBUTING.md compliance

  • CHANGELOG.md — six [BREAKING] entries under Unreleased + a list of additions, all referencing the migration skill below
  • .claude/skills/setup-deploy-auth/SKILL.md — one-shot migration that /upgrade-boop will offer to run when a fork pulls this in: installs deps, pushes the schema additions, generates and stores the admin password, bootstraps the admin user, reconciles .env.local against .env.example. Idempotent.

What I verified

  • tsc --noEmit clean for new code (37 baseline errors all stem from convex/_generated not being on disk locally — codegen runs in CI/Docker)
  • npm test — 15/15 pass
  • npm run build:debug — clean Vite build
  • docker build — stages 1–2 succeed; stage 3 needs CONVEX_DEPLOY_KEY (CI provides it)

What I couldn't verify (no accounts)

  • ⚠️ Full Fly deploy end-to-end (no Fly account)
  • ⚠️ E2E iMessage round-trip with real signing secret (no Sendblue dashboard access)
  • ⚠️ The interactive npm run deploy script in full (only typechecked — needs Fly + Convex deploy key + gh CLI to exercise)
  • ⚠️ Convex Auth runtime behavior (login flow, JWT issuance, JWKS endpoint URL) — implementation follows @convex-dev/auth v0.0.x docs; verified at compile time but not at runtime
  • ⚠️ The migration skill end-to-end (would need a fresh fork without auth set up)

If you (or another contributor with full env) can run the runtime smoke test, I'd appreciate it. The most likely failure modes I'd watch for:

  • createAccount argument shape if the installed @convex-dev/auth version differs from what I targeted
  • The WS ?token= upgrade path
  • npx convex run users:bootstrap flag handling under the deploy-key path

Out of scope (explicit)

Not in this PR (separate concerns, can be follow-ups):

  • Multi-user support
  • General hardening (rate limits, body-size limits, input validation, buffer bounds)
  • Convex Auth on Express → Convex calls (deploy key remains the trust boundary inside the perimeter)
  • Composio webhook signature validation
  • Auto-rotation of CLAUDE_CODE_OAUTH_TOKEN (1-year manual rotation documented in docs/deploying.md)
  • Background loop sharding (single-replica documented as a hard constraint)
  • Deduping setup.ts and deploy.ts helpers into a shared scripts/lib/ (commit ready in refactor/cli-helpers branch on my fork; happy to open as PR 2 if you want)

Design doc

Full spec lives at docs/superpowers/specs/2026-04-27-auto-deploy-pr-design.md if you want the long-form rationale on the auth perimeters and trade-offs.

qtpi-bonding and others added 21 commits April 27, 2026 17:39
Design for an upstream PR that adds Docker + Fly deployment to boop-agent,
coupled with Convex Auth single-user gating to close the admin-route auth
gap that has blocked deployment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
17-task plan implementing the design at docs/superpowers/specs/2026-04-27-auto-deploy-pr-design.md.
Each task is self-contained for fresh-subagent execution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Moves tsx and jose into dependencies (from devDependencies), adds test
and deploy scripts to package.json, and regenerates package-lock.json.
npm test exits 0 with 0 tests collected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…webhook

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…vex codegen

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Server callers now use internalQuery/internalMutation twins (no requireUser)
instead of the public+auth versions, fixing runtime "unauthenticated" errors
when the Express server calls Convex with a deploy key rather than a JWT.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…bug UI

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…UDE_CODE_OAUTH_TOKEN

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…terns)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per CONTRIBUTING.md: every PR adds an Unreleased entry, breaking changes
get [BREAKING] markers, and migrations are written as a Claude skill that
/upgrade-boop can run.

This change introduces six [BREAKING] entries (admin endpoints require JWT,
Convex public functions require auth, webhook HMAC verification, WS upgrade
token, schema additions, new env vars) all addressed by /setup-deploy-auth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@qtpi-bonding qtpi-bonding marked this pull request as draft April 28, 2026 04:10
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 28, 2026

Greptile Summary

This PR adds turnkey Fly.io deployment (Dockerfile, fly.toml, GitHub Actions pipeline) and a Convex Auth password layer that gates both the Express API and direct Convex queries, with HMAC verification on the Sendblue webhook. The design is coherent and the previously-flagged issues around JWKS URL, the JWKS-per-upgrade cost, and the debug-UI static-file path have all been addressed.

  • P0 (unresolved from prior review): server/convex-client.ts still creates a bare ConvexHttpClient with no admin credential. Every internal.* call made by the server — webhooks, agent execution, automations, heartbeat, consolidation — will fail at runtime with a permission error.
  • P1: The Fly deploy key is absent from flySecrets in scripts/deploy.ts; the Fly machine will not have this env var available even once convex-client.ts is fixed.
  • P2: verifyHmac in server/auth.ts compares string lengths before calling timingSafeEqual, but constructs Buffers from UTF-8; a 64-character Unicode signature would pass the string-length guard yet produce mismatched byte lengths, causing timingSafeEqual to throw on the public webhook endpoint.

Confidence Score: 2/5

Not safe to merge — the unresolved P0 in server/convex-client.ts means the deployed server cannot reach any internal Convex function, breaking webhooks and all background loops.

A P0 (no admin auth on ConvexHttpClient) from the prior review remains unfixed; the changeset does not touch convex-client.ts. All 51 files of auth/deploy infrastructure are correct in isolation, but they will not function together until that root fix lands alongside the missing Fly deploy-key secret.

server/convex-client.ts (P0, not in this diff — needs .setAdminAuth()); scripts/deploy.ts (P1 — Convex deploy key missing from flySecrets)

Important Files Changed

Filename Overview
server/auth.ts Well-structured HMAC + JWT middleware; minor robustness issue with timingSafeEqual and Unicode signatures on the public webhook path.
server/index.ts Auth gate correctly placed with public allowlist; shared JWKS verifier instance avoids per-request cache misses; WS upgrade path uses the same verifier.
server/sendblue.ts HMAC verification and phone whitelist correctly gating the webhook; logic sound but verifyHmac's Unicode edge case (flagged inline) is reachable here.
convex/users.ts bootstrap and setPassword actions cleanly handle the TOCTOU race via try/catch on the 'already exists' error from createAccount.
server/convex-client.ts CRITICAL (P0 — unchanged): ConvexHttpClient has no .setAdminAuth() call; every internal.* Convex function called by the server will fail at runtime with a permission error.
scripts/deploy.ts Interactive deploy script covers all necessary secrets except the Convex deploy key (P1); once convex-client.ts is fixed, the deploy key must be pushed to Fly secrets too.
.github/workflows/deploy.yml Pipeline correctly sequences test → Convex deploy → bootstrap → Fly deploy → smoke; concurrency group prevents duplicate deploys.
Dockerfile Clean multi-stage build; runtime stage drops build artifacts and runs as non-root node user; convex codegen inside the build stage resolves the generated-type dependency correctly.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph Internet
        User([User Browser])
        Sendblue([Sendblue])
    end

    subgraph Fly_Machine["Fly.io Machine (port 3456)"]
        Static["express.static /debug/dist\n(login page, no auth)"]
        Webhook["POST /sendblue/webhook\n(no JWT required)"]
        AdminGate["requireAdmin middleware\n(JWT via Authorization header)"]
        API["Protected API routes\n/chat /consolidate /agents /composio /ws"]
    end

    subgraph Convex["Convex Backend (.convex.site)"]
        JWKS["/.well-known/jwks.json"]
        Internal["internal.* functions\n(requires deploy key)"]
    end

    User -->|GET /| Static
    User -->|POST login| ConvexAuth["@convex-dev/auth Password Provider"]
    ConvexAuth -->|JWT| User
    User -->|Bearer JWT| AdminGate
    AdminGate -->|verify JWT| JWKS
    AdminGate -->|pass| API
    API -->|internal.*| Internal
    Sendblue -->|POST + HMAC sig| Webhook
    Webhook -->|verifyHmac| HMACCheck{HMAC valid?}
    HMACCheck -->|yes| Internal
    HMACCheck -->|no| Reject401["401 Unauthorized"]
Loading

Reviews (3): Last reviewed commit: "fix(server): correct relative path to de..." | Re-trigger Greptile

Comment thread server/auth.ts Outdated
Comment thread server/index.ts
Comment thread scripts/deploy.ts
Comment thread server/index.ts Outdated
Comment on lines +37 to +50
// load before the SPA can render the login form, so they're served
// BEFORE requireAdmin gates the API surface.
if (process.env.NODE_ENV === "production") {
const here = path.dirname(fileURLToPath(import.meta.url));
const debugDist = path.resolve(here, "../../debug/dist");
app.use(express.static(debugDist));
app.get("/debug/*", (_req, res) => {
res.sendFile(path.join(debugDist, "index.html"));
});
}

// AUTH GATE: every route below requires a valid Convex Auth JWT, except
// the explicit allowlist inside requireAdmin() (/sendblue/webhook + /health).
app.use(requireAdmin());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 SPA catch-all intercepts API 404s in production

app.get("/debug/*", ...) is registered before app.use(requireAdmin()) and before the API routes (/sendblue, /composio, etc.). In production any request for an undefined path will hit express.static, fail to find the file, then fall through to this handler and return index.html with a 200 — masking the real 404.

Move the SPA catch-all to the very end of the route list (after all API routes and the auth gate) so it only fires for genuine client-side navigation paths.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

▎ The catch-all is scoped to /debug/*, not *. None of the API routes (/sendblue,
▎ /composio, /agents, /chat, /consolidate, /health) live under /debug/, so undefined
▎ API paths don't reach this handler — they fall through to requireAdmin (which 401s
▎ on missing token) or to Express's default 404. The static mount is intentionally
▎ before requireAdmin because the SPA login form needs to load before the user has a
▎ token to gate other routes with. Could you point at a specific path that gets
▎ masked? I don't see one.

Comment thread convex/users.ts Outdated
qtpi-bonding and others added 4 commits April 27, 2026 21:25
@convex-dev/auth signs tokens with CONVEX_SITE_URL as issuer and serves
/.well-known/jwks.json on the .convex.site host. The middleware was
using CONVEX_URL (.convex.cloud) for both, so every authenticated
request 401'd in production.

Addresses Greptile P1 raroque#1 on PR raroque#23.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The WebSocket upgrade handler was calling createRemoteJWKSet inside the
event handler, building a fresh JWKS instance (with empty cache) for
every connection. A momentary JWKS endpoint blip would reject all
concurrent WS upgrades.

Construct one verifier at server startup via defaultVerifier() and share
it between requireAdmin() and the upgrade handler — single source of
truth for JWT verification, single JWKS cache.

Addresses Greptile P1 raroque#2 on PR raroque#23.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
server/auth.ts needs CONVEX_SITE_URL at startup to fetch the Convex
JWKS. The deploy script wasn't pushing it, so production servers would
throw on boot.

Read it from .env.local; fall back to deriving from CONVEX_URL by
swapping .convex.cloud → .convex.site if absent. Also document it in
.env.example.

Addresses Greptile P1 raroque#3 on PR raroque#23.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bootstrap is an internalAction, so the countUsers query and createAccount
call don't share a transaction. A second concurrent invocation could see
zero users and race into createAccount, producing a thrown error.

Catch the "already exists" error from @convex-dev/auth's
createAccountFromCredentials and treat it as the idempotent path.

Addresses Greptile P2 raroque#5 on PR raroque#23.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@qtpi-bonding qtpi-bonding marked this pull request as ready for review April 28, 2026 04:33
@qtpi-bonding qtpi-bonding marked this pull request as draft April 28, 2026 04:33
Comment thread server/index.ts
tsx runs server/index.ts directly from /app/server in the Docker image,
so import.meta.url resolves to /app/server/index.ts. The previous
"../../debug/dist" walked up two levels to / and pointed at /debug/dist
(which doesn't exist), making every /debug/* request 500 in prod.

One level up is correct: /app/server → /app/debug/dist.

Addresses Greptile P1 on PR raroque#23.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@qtpi-bonding qtpi-bonding marked this pull request as ready for review April 28, 2026 05:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant