feat(deploy): enable prod stage via the lite deploy flow by law-chain-hot · Pull Request #836 · boxlite-ai/boxlite

law-chain-hot · 2026-06-22T14:24:41Z

What

Enables a second SST stage (prod) through the existing one-click Deploy + Runner Rollout buttons, without changing the dev flow. This branch is built on top of the dev-button work (#834) and supersedes it — it contains those commits plus the prod-enablement, so #834 does not need to be merged separately.

Changes

sst.config.ts — REGION reads AWS_REGION (default ap-southeast-1); runner EC2 Name tag is stage-scoped (boxlite-<stage>-runner-default) so dev/prod runners stay distinguishable.
deploy.yml / runner-rollout.yml — stage choice adds prod; new region input replaces the hardcoded region; deploy gates on a per-stage GitHub Environment (environment: ${{ inputs.stage }}); runner-rollout S3_BUCKET is stage-scoped.
Deploy role renamed boxlite-github-deployer-<stage> → boxlite-app-<stage>-deploy (assumed via OIDC).
runner-update-binary.sh — stage-scoped tag lookup with a dev-only legacy fallback + a multi-match guard.
scripts/seed-prod-from-dev.mjs (new) — copies a stage's SSM env to another, domain-rewriting STACK_DOMAIN, leaving Auth0 / SSH / runner-ip blank for manual seeding. Dry-run by default.

Prerequisites before a prod deploy can run (account owner)

Create boxlite-app-dev-deploy and boxlite-app-prod-deploy (admin, unbounded; OIDC trust sub repo:boxlite-ai/boxlite:environment:<stage>). The existing dev role is renamed to boxlite-app-dev-deploy.
Extend boxlite-bounded-role-admin so PassRole + AddRoleToInstanceProfile cover role/boxlite-prod-* (today they list only boxlite-dev-* / boxlite-e2e-*).
Create a prod GitHub Environment.
Seed SSM /boxlite/prod/{cloudflare-*, env/*} (env via seed-prod-from-dev.mjs; Auth0 values filled once the prod Auth0 app exists).

Safety

dev flow is unchanged (region defaults to ap-southeast-1, the dev GitHub Environment already exists). Reviewed for dev-safety, region-correctness, and GitHub Actions semantics; verified locally (YAML parse, node --check, bash -n, seed dry-run).

Summary by CodeRabbit

New Features
- Added manually-triggered deployment and runner rollout workflows with stage/region selection, diff/preview vs deploy, and upgrade/rollback controls.
Improvements
- Centralized deploy-time configuration in AWS SSM: seed dev→prod, auto-load stage-specific environment variables during sst, and updated teardown guidance.
- Runner binary updates are now stage-aware, support installing from dev tarballs or release builds, and target the correct runner instances by stage.
- Deploy configuration now uses an explicit AWS region fallback and applies a default IAM permissions boundary where needed.
Bug Fixes
- Prevented non-critical runner version checks from failing deployments.

coderabbitai · 2026-06-22T14:24:58Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 0042105d-3a14-4e04-835f-9db737e6d049

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds two manual GitHub Actions workflows (deploy.yml for infra deploys, runner-rollout.yml for runner binary rollouts), three SSM-related scripts for seeding and loading environment variables, and updates sst.config.ts for dynamic region, IAM permission boundaries, and stage-scoped resource names. Also hardens runner-update-binary.sh for stage-aware instance lookup and dev/release tarball installs.

Changes

Deploy & Runner Rollout Pipeline

Layer / File(s)	Summary
SSM env seeding and loading scripts `apps/infra/scripts/seed-deploy-env-ssm.mjs`, `apps/infra/scripts/seed-prod-from-dev.mjs`, `apps/infra/scripts/sst-with-cloudflare.mjs`	`seed-deploy-env-ssm.mjs` writes `.env` key/value pairs to SSM as `SecureString` under `/boxlite/<stage>/env/` with dry-run support. `seed-prod-from-dev.mjs` reads dev-stage SSM parameters, classifies each key as copy/domain-rewrite/blank, and writes to the production SSM path when `--apply` is passed. `sst-with-cloudflare.mjs` gains a `loadEnvFromSsm(stage)` function that populates `process.env` from SSM before spawning `sst`, preserving local env precedence.
sst.config.ts: dynamic region, IAM boundary, stage-scoped runner names `apps/infra/sst.config.ts`	`REGION` reads from `process.env.AWS_REGION` with `ap-southeast-1` fallback. A global `$transform` on `aws.iam.Role` applies a `permissionsBoundary` ARN to every IAM role when not already set. Production checks now use `'prod'` instead of `'production'`. `RunnerRole`, `RunnerProfile`, and the default runner tag are renamed to explicit stage-scoped names.
runner-update-binary.sh: stage-aware instance and tarball handling `scripts/deploy/runner-update-binary.sh`	Instance selection now checks `BOXLITE_RUNNER_INSTANCE_ID` first, then the `boxlite-<stage>-runner-default` tag (plus legacy `boxlite-runner-default` for dev). Asset URL switches between `BOXLITE_RUNNER_TARBALL_URL` (dev builds) and the GitHub release URL. `SHA_URL` is derived from `BOXLITE_RUNNER_TARBALL_SHA256_URL` or appending `.sha256`; checksum fetch failure logs a warning and proceeds. The `--version` cosmetic call is guarded with `\|\| true`.
deploy.yml: manual infra deploy workflow `.github/workflows/deploy.yml`	New `workflow_dispatch` workflow accepts `stage`, `mode` (diff/deploy), and `region`. Configures OIDC AWS auth scoped to the chosen stage's IAM role, stage-scoped concurrency without cancellation, then runs either `sst diff` or `npm run deploy` for `apps/infra`.
runner-rollout.yml: manual runner rollout workflow `.github/workflows/runner-rollout.yml`	New `workflow_dispatch` workflow accepts `stage`, `region`, `source` (dev-build/release), `ref`, `mode` (upgrade/rollback), and optional `instance_id`. For `dev-build`: checks out the ref, builds Go daemon/computer-use/runner binaries with embedded versioning, packages a tarball with sha256, uploads to S3, presigns URLs (30-min TTL), and invokes `runner-update-binary.sh` with tarball/sha URLs. For `release`: invokes `runner-update-binary.sh` directly with the version ref. Rollback mode is rejected unless `source=release`; `prod` stage is rejected unless `source=release`.
Documentation update: stage naming consistency `apps/infra/README.md`	Updates the Cost section to reference `--stage prod` instead of `--stage production` for consistency with config conventions.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

boxlite-ai/boxlite#793: Publishes *.sha256 files alongside runner builds; directly feeds into this PR's checksum verification logic in runner-update-binary.sh.
boxlite-ai/boxlite#819: Updates the runner EC2 default Name-tag convention and instance-selection logic in runner-update-binary.sh, which this PR further extends with stage-scoped naming.

Suggested reviewers

G4614

Poem

🐇 Hoppity-hop through the SSM store,
Seeding secrets from dev to prod shore.
IAM boundaries, stage-scoped names in tow,
One-click deploys wherever we go!
The runner rolls out with tarball in paw —
No hardcoded tags left to gnaw! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: enabling a production stage through the deployment workflow, which is the primary objective of the PR.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/prod-enable

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 08b4a28041

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-22T14:29:33Z

+  rollout:
+    name: Roll out runner
+    needs: config
+    runs-on: ubuntu-latest


Add the rollout job environment before assuming AWS

The rollout job assumes boxlite-app-<stage>-deploy, whose documented OIDC trust is repo:boxlite-ai/boxlite:environment:<stage>, but this job never declares a GitHub environment the way deploy.yml does. GitHub's OIDC sub only includes environment:<name> when the job references that environment; otherwise it uses the ref/pull_request subject, so the Assume AWS deploy role step will be rejected for both dev and production once the roles are created as described, and relaxing that trust would also remove the intended environment approval gate.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-22T14:29:33Z

+    if (!key || process.env[key]) continue // already set (e.g. from .env) wins
+    process.env[key] = p.Value


Load local dotenv before filling from SSM

When this wrapper runs from a laptop with AWS credentials, these lines populate process.env from SSM before spawning sst; apps/infra/sst.config.ts loads .env later with dotenv's default non-overriding behavior. As a result, local .env values that are not already shell-exported, such as STACK_DOMAIN or OIDC_*, are ignored and replaced by the stage's SSM values, contrary to the comment that laptops keep using .env unchanged.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-22T14:29:33Z

+
+import { execFileSync } from 'node:child_process'
+
+const REGION = process.env.AWS_REGION || 'ap-southeast-1'


Separate source and target SSM regions

Using one REGION for both readSourceEnv and putParam breaks the advertised flow when production is deployed outside dev's region. With AWS_REGION=us-west-2, the script reads /boxlite/dev/env/* from us-west-2 even though dev lives in ap-southeast-1; with the default region, it writes /boxlite/production/env/* to ap-southeast-1 where a us-west-2 production deploy will not load it.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

.github/workflows/deploy.yml (1)
58-59: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider adding a job timeout.

SST deployments can take significant time, but unbounded jobs risk hanging indefinitely on infrastructure issues (e.g., stuck CloudFormation stacks). A reasonable timeout provides a safety net.
   deploy:
     runs-on: ubuntu-latest
+    timeout-minutes: 45
     # GitHub Environment per stage — gates the deploy behind that environment's
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/deploy.yml around lines 58 - 59, The deploy job in the
GitHub Actions workflow currently has no timeout protection, which means it can
hang indefinitely if infrastructure issues occur. Add a timeout-minutes field to
the deploy job configuration to set a reasonable maximum execution time for the
job, preventing it from consuming resources unnecessarily during infrastructure
failures.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/runner-rollout.yml:
- Around line 162-165: The "Rewrite go.work for minimal modules" step hardcodes
the Go version as 1.25.4 in the printf command, which conflicts with the Go
version defined in config.yml (1.24) and the version already retrieved via
needs.config.outputs.go-version at the workflow setup. Replace the hardcoded go
1.25.4 string in the printf command with a reference to the
needs.config.outputs.go-version output variable to ensure consistency across the
workflow and align with the configured version.

In `@apps/infra/scripts/seed-deploy-env-ssm.mjs`:
- Around line 69-89: The script catches errors from the put-parameter AWS CLI
call but continues executing and completes successfully regardless of whether
any writes failed. To fix this, add a flag (such as an error tracking variable)
that is set to true whenever an error is caught in the catch block at line 84.
After the final console.log statement at line 89, check this flag and call
process.exit with a non-zero code (e.g., 1) if any errors were encountered
during the loop. This ensures the script signals failure to its caller when any
SSM parameter write fails, preventing partially seeded environments from
appearing successful.

In `@apps/infra/scripts/seed-prod-from-dev.mjs`:
- Around line 103-109: The script should fail fast when the source SSM path
contains zero keys. After the readSourceEnv function call and the console.log
statement that outputs the source size, add a validation check to verify that
source.size is greater than 0. If source.size equals 0, throw an error with a
descriptive message indicating that the source environment has no keys and the
script cannot proceed, rather than allowing the script to continue with an empty
configuration.

In `@apps/infra/scripts/sst-with-cloudflare.mjs`:
- Around line 99-103: The truthy check on process.env[key] in the condition on
line 101 causes empty-string environment variables to be overwritten by SSM
values, breaking the intended precedence where already-set values should win.
Replace the truthy check `process.env[key]` with an existence check using the
`in` operator (e.g., `key in process.env`) to properly detect whether a key is
already set in the environment, regardless of whether its value is an empty
string or any other falsy value.
- Around line 88-90: The catch block in the SSM environment load section only
logs a warning when the error code is ENOENT (aws CLI not found), but silently
returns for all other errors including authentication, region, and path
configuration issues. Modify the error handling to add an else clause after the
ENOENT check that logs a warning with the actual error details for non-ENOENT
failures, so that configuration problems are visible and debuggable instead of
being silently swallowed.

---

Nitpick comments:
In @.github/workflows/deploy.yml:
- Around line 58-59: The deploy job in the GitHub Actions workflow currently has
no timeout protection, which means it can hang indefinitely if infrastructure
issues occur. Add a timeout-minutes field to the deploy job configuration to set
a reasonable maximum execution time for the job, preventing it from consuming
resources unnecessarily during infrastructure failures.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: fc77cd02-1035-4d1f-96c9-abbcd8acce01

📥 Commits

Reviewing files that changed from the base of the PR and between 4a52285 and 08b4a28.

📒 Files selected for processing (7)

.github/workflows/deploy.yml
.github/workflows/runner-rollout.yml
apps/infra/scripts/seed-deploy-env-ssm.mjs
apps/infra/scripts/seed-prod-from-dev.mjs
apps/infra/scripts/sst-with-cloudflare.mjs
apps/infra/sst.config.ts
scripts/deploy/runner-update-binary.sh

coderabbitai · 2026-06-22T14:35:09Z

+      - name: Rewrite go.work for minimal modules
+        if: ${{ steps.in.outputs.source == 'dev-build' }}
+        run: |
+          printf 'go 1.25.4\n\nuse (\n\t./runner\n\t./daemon\n\t./common-go\n\t./api-client-go\n\t./libs/computer-use\n\t../sdks/go\n)\n' > apps/go.work


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check what go-version config.yml exports cat .github/workflows/config.yml | grep -A5 -B5 'go-version'

Repository: boxlite-ai/boxlite

Length of output: 919

Hardcoded Go version conflicts with config.yml.

Line 165 hardcodes go 1.25.4 in the generated go.work, but config.yml defines go-version: '1.24' and the Go setup at line 131 uses needs.config.outputs.go-version. This version divergence could cause build failures or unexpected behavior.

Use the configured Go version instead:

Suggested fix

- name: Rewrite go.work for minimal modules if: ${{ steps.in.outputs.source == 'dev-build' }} + env: + GO_VERSION: ${{ needs.config.outputs.go-version }} run: | - printf 'go 1.25.4\n\nuse (\n\t./runner\n\t./daemon\n\t./common-go\n\t./api-client-go\n\t./libs/computer-use\n\t../sdks/go\n)\n' > apps/go.work + printf 'go %s\n\nuse (\n\t./runner\n\t./daemon\n\t./common-go\n\t./api-client-go\n\t./libs/computer-use\n\t../sdks/go\n)\n' "$GO_VERSION" > apps/go.work

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/runner-rollout.yml around lines 162 - 165, The "Rewrite go.work for minimal modules" step hardcodes the Go version as 1.25.4 in the printf command, which conflicts with the Go version defined in config.yml (1.24) and the version already retrieved via needs.config.outputs.go-version at the workflow setup. Replace the hardcoded go 1.25.4 string in the printf command with a reference to the needs.config.outputs.go-version output variable to ensure consistency across the workflow and align with the configured version.

coderabbitai · 2026-06-22T14:35:09Z

+let ok = 0
+for (const [key, val] of pairs) {
+  const name = `/boxlite/${stage}/env/${key}`
+  if (DRY) {
+    console.log(`  would put ${name}`)
+    continue
+  }
+  try {
+    execFileSync(
+      'aws',
+      ['ssm', 'put-parameter', '--region', REGION, '--name', name, '--type', 'SecureString', '--overwrite', '--value', val],
+      { stdio: ['ignore', 'ignore', 'pipe'] }, // never echo the value or the version output
+    )
+    console.log(`  ✓ ${key}`)
+    ok++
+  } catch (err) {
+    const msg = (err.stderr ? err.stderr.toString() : err.message).split('\n')[0]
+    console.error(`  ✗ ${key}: ${msg}`)
+  }
+}
+console.log(DRY ? 'seed-deploy-env-ssm: dry-run, nothing written' : `seed-deploy-env-ssm: wrote ${ok}/${pairs.length}`)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail the script when any SSM write fails.

Line 84 catches put-parameter errors, but Line 89 still completes successfully. That can leave /boxlite/<stage>/env/* partially seeded without signaling failure.

Suggested fix

let ok = 0 +let failures = 0 for (const [key, val] of pairs) { const name = `/boxlite/${stage}/env/${key}` @@ } catch (err) { const msg = (err.stderr ? err.stderr.toString() : err.message).split('\n')[0] console.error(` ✗ ${key}: ${msg}`) + failures++ } } console.log(DRY ? 'seed-deploy-env-ssm: dry-run, nothing written' : `seed-deploy-env-ssm: wrote ${ok}/${pairs.length}`) +if (!DRY && failures > 0) process.exit(1)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let ok = 0

for (const [key, val] of pairs) {

const name = `/boxlite/${stage}/env/${key}`

if (DRY) {

console.log(` would put ${name}`)

continue

}

try {

execFileSync(

'aws',

['ssm', 'put-parameter', '--region', REGION, '--name', name, '--type', 'SecureString', '--overwrite', '--value', val],

{ stdio: ['ignore', 'ignore', 'pipe'] }, // never echo the value or the version output

)

console.log(` ✓ ${key}`)

ok++

} catch (err) {

const msg = (err.stderr ? err.stderr.toString() : err.message).split('\n')[0]

console.error(` ✗ ${key}: ${msg}`)

}

}

console.log(DRY ? 'seed-deploy-env-ssm: dry-run, nothing written' : `seed-deploy-env-ssm: wrote ${ok}/${pairs.length}`)

let ok = 0

let failures = 0

for (const [key, val] of pairs) {

const name = `/boxlite/${stage}/env/${key}`

if (DRY) {

console.log(` would put ${name}`)

continue

}

try {

execFileSync(

'aws',

['ssm', 'put-parameter', '--region', REGION, '--name', name, '--type', 'SecureString', '--overwrite', '--value', val],

{ stdio: ['ignore', 'ignore', 'pipe'] }, // never echo the value or the version output

)

console.log(` ✓ ${key}`)

ok++

} catch (err) {

const msg = (err.stderr ? err.stderr.toString() : err.message).split('\n')[0]

console.error(` ✗ ${key}: ${msg}`)

failures++

}

}

console.log(DRY ? 'seed-deploy-env-ssm: dry-run, nothing written' : `seed-deploy-env-ssm: wrote ${ok}/${pairs.length}`)

if (!DRY && failures > 0) process.exit(1)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/infra/scripts/seed-deploy-env-ssm.mjs` around lines 69 - 89, The script catches errors from the put-parameter AWS CLI call but continues executing and completes successfully regardless of whether any writes failed. To fix this, add a flag (such as an error tracking variable) that is set to true whenever an error is caught in the catch block at line 84. After the final console.log statement at line 89, check this flag and call process.exit with a non-zero code (e.g., 1) if any errors were encountered during the loop. This ensures the script signals failure to its caller when any SSM parameter write fails, preventing partially seeded environments from appearing successful.

coderabbitai · 2026-06-22T14:35:09Z

+const source = readSourceEnv(SOURCE_STAGE)
+console.log(
+  `seed-prod-from-dev: ${SOURCE_STAGE} -> ${TARGET_STAGE} | region=${REGION} | ` +
+    `${DEV_DOMAIN} -> ${PROD_DOMAIN} | ${APPLY ? 'APPLY (writing)' : 'DRY-RUN (nothing written)'}`,
+)
+console.log(`  source has ${source.size} key(s) under /boxlite/${SOURCE_STAGE}/env/\n`)
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when the source SSM path is empty.

If Line 103 loads zero keys, the script currently continues and can report a misleading “done” result. For stage bootstrapping, empty source should be treated as an error condition.

Suggested fix

const source = readSourceEnv(SOURCE_STAGE) +if (source.size === 0) { + console.error( + `seed-prod-from-dev: no keys found under /boxlite/${SOURCE_STAGE}/env/ in ${REGION}; ` + + `check source stage/region/permissions before continuing`, + ) + process.exit(1) +} console.log( `seed-prod-from-dev: ${SOURCE_STAGE} -> ${TARGET_STAGE} | region=${REGION} | ` +

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const source = readSourceEnv(SOURCE_STAGE)

console.log(

`seed-prod-from-dev: ${SOURCE_STAGE} -> ${TARGET_STAGE} | region=${REGION} | ` +

`${DEV_DOMAIN} -> ${PROD_DOMAIN} | ${APPLY ? 'APPLY (writing)' : 'DRY-RUN (nothing written)'}`,

)

console.log(` source has ${source.size} key(s) under /boxlite/${SOURCE_STAGE}/env/\n`)

const source = readSourceEnv(SOURCE_STAGE)

if (source.size === 0) {

console.error(

`seed-prod-from-dev: no keys found under /boxlite/${SOURCE_STAGE}/env/ in ${REGION}; ` +

`check source stage/region/permissions before continuing`,

)

process.exit(1)

}

console.log(

`seed-prod-from-dev: ${SOURCE_STAGE} -> ${TARGET_STAGE} | region=${REGION} | ` +

`${DEV_DOMAIN} -> ${PROD_DOMAIN} | ${APPLY ? 'APPLY (writing)' : 'DRY-RUN (nothing written)'}`,

)

console.log(` source has ${source.size} key(s) under /boxlite/${SOURCE_STAGE}/env/\n`)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/infra/scripts/seed-prod-from-dev.mjs` around lines 103 - 109, The script should fail fast when the source SSM path contains zero keys. After the readSourceEnv function call and the console.log statement that outputs the source size, add a validation check to verify that source.size is greater than 0. If source.size equals 0, throw an error with a descriptive message indicating that the source environment has no keys and the script cannot proceed, rather than allowing the script to continue with an empty configuration.

coderabbitai · 2026-06-22T14:35:09Z

+  } catch (err) {
+    if (err.code === 'ENOENT') console.warn('sst-with-cloudflare: `aws` CLI not found; skipping SSM env load')
+    return


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t swallow non-ENOENT SSM load failures silently.

Line 88-90 returns without warning for auth/region/path errors. That hides the actual failure mode and makes deploy-time config issues hard to diagnose.

Suggested fix

} catch (err) { - if (err.code === 'ENOENT') console.warn('sst-with-cloudflare: `aws` CLI not found; skipping SSM env load') + if (err.code === 'ENOENT') { + console.warn('sst-with-cloudflare: `aws` CLI not found; skipping SSM env load') + } else { + const msg = (err.stderr ? err.stderr.toString() : err.message).split('\n')[0] + console.warn(`sst-with-cloudflare: failed loading SSM env from ${path}*: ${msg}`) + } return }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

} catch (err) {

if (err.code === 'ENOENT') console.warn('sst-with-cloudflare: `aws` CLI not found; skipping SSM env load')

return

} catch (err) {

if (err.code === 'ENOENT') {

console.warn('sst-with-cloudflare: `aws` CLI not found; skipping SSM env load')

} else {

const msg = (err.stderr ? err.stderr.toString() : err.message).split('\n')[0]

console.warn(`sst-with-cloudflare: failed loading SSM env from ${path}*: ${msg}`)

}

return

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/infra/scripts/sst-with-cloudflare.mjs` around lines 88 - 90, The catch block in the SSM environment load section only logs a warning when the error code is ENOENT (aws CLI not found), but silently returns for all other errors including authentication, region, and path configuration issues. Modify the error handling to add an else clause after the ENOENT check that logs a warning with the actual error details for non-ENOENT failures, so that configuration problems are visible and debuggable instead of being silently swallowed.

coderabbitai · 2026-06-22T14:35:10Z

+  for (const p of params || []) {
+    const key = p.Name.slice(path.length) // segment after the prefix = env var name
+    if (!key || process.env[key]) continue // already set (e.g. from .env) wins
+    process.env[key] = p.Value
+    loaded++


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix env precedence check for empty-string values.

Line 101 uses a truthy check, so '' gets overwritten by SSM. That breaks the “already set wins” contract when empty-string is intentionally configured.

Suggested fix

- if (!key || process.env[key]) continue // already set (e.g. from .env) wins + if (!key || process.env[key] !== undefined) continue // already set (including '') wins

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/infra/scripts/sst-with-cloudflare.mjs` around lines 99 - 103, The truthy check on process.env[key] in the condition on line 101 causes empty-string environment variables to be overwritten by SSM values, breaking the intended precedence where already-set values should win. Replace the truthy check `process.env[key]` with an existence check using the `in` operator (e.g., `key in process.env`) to properly detect whether a key is already set in the environment, regardless of whether its value is an empty string or any other falsy value.

coderabbitai

🧹 Nitpick comments (1)

apps/infra/sst.config.ts (1)

122-124: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Hardcoded AWS account ID limits portability.

The permission boundary ARN contains a hardcoded account ID (064212132677). If this infrastructure is ever deployed to a different AWS account, this will fail. Consider deriving the account ID dynamically.

♻️ Suggested improvement

   $transform(aws.iam.Role, (args) => {
-    args.permissionsBoundary ??= 'arn:aws:iam::064212132677:policy/boxlite-role-boundary'
+    args.permissionsBoundary ??= $interpolate`arn:aws:iam::${aws.getCallerIdentityOutput().accountId}:policy/boxlite-role-boundary`
   })

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/infra/sst.config.ts` around lines 122 - 124, The permissionsBoundary ARN
in the $transform function applied to aws.iam.Role contains a hardcoded AWS
account ID (064212132677), which breaks portability across different AWS
accounts. Replace the hardcoded account ID in the ARN string with a dynamic
reference to the current AWS account ID. You can obtain the current account ID
from the aws.getCallerIdentity() function or by using the aws.getAccountId()
helper function, and then construct the ARN string dynamically by interpolating
the retrieved account ID into the
arn:aws:iam::ACCOUNT_ID:policy/boxlite-role-boundary format.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@apps/infra/sst.config.ts`:
- Around line 122-124: The permissionsBoundary ARN in the $transform function
applied to aws.iam.Role contains a hardcoded AWS account ID (064212132677),
which breaks portability across different AWS accounts. Replace the hardcoded
account ID in the ARN string with a dynamic reference to the current AWS account
ID. You can obtain the current account ID from the aws.getCallerIdentity()
function or by using the aws.getAccountId() helper function, and then construct
the ARN string dynamically by interpolating the retrieved account ID into the
arn:aws:iam::ACCOUNT_ID:policy/boxlite-role-boundary format.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b1900312-d2c2-44eb-a383-a7bf468dd8db

📥 Commits

Reviewing files that changed from the base of the PR and between 11d51d0 and a83420b.

📒 Files selected for processing (6)

.github/workflows/deploy.yml
.github/workflows/runner-rollout.yml
apps/infra/README.md
apps/infra/scripts/seed-prod-from-dev.mjs
apps/infra/sst.config.ts
scripts/deploy/runner-update-binary.sh

✅ Files skipped from review due to trivial changes (1)

apps/infra/README.md

🚧 Files skipped from review as they are similar to previous changes (4)

apps/infra/scripts/seed-prod-from-dev.mjs
scripts/deploy/runner-update-binary.sh
.github/workflows/deploy.yml
.github/workflows/runner-rollout.yml

GitHub Actions deploy buttons for dev, OIDC-only (no static keys): - deploy.yml: Cloud deploy (workflow_dispatch diff|deploy, environment: dev) — assumes boxlite-github-deployer-<stage>, runs sst diff/deploy with SSM-injected env. - runner-rollout.yml: swap the boxlite-runner binary in place via SSM (release or dev-build from S3) with checksum verify + auto-rollback. - runner-update-binary.sh: accept an S3 dev-channel source (separate sha256 URL). - seed-deploy-env-ssm.mjs / sst-with-cloudflare.mjs: inject stage env + Cloudflare creds from SSM so CI matches a laptop .env. - sst.config.ts: $transform stamps boxlite-role-boundary on every IAM role the app creates (satisfies the bounded deploy role's conditional create/manage grant); RunnerRole/Profile get explicit boxlite-dev-runner names so they fall under the deploy role's boxlite-* IAM scope. Validated 2026-06-22: deploy --stage dev completes end-to-end; dev healthy (8 services running, 5 ALBs active, api/dashboard 200, box create verified).

…sist-credentials) - runner-rollout.yml: pass workflow_dispatch inputs (ref/instance_id, free-text) via env vars instead of ${{ }} interpolated into run: scripts, so a crafted input is a shell value rather than executable script (CodeRabbit/zizmor template-injection). - deploy.yml + runner-rollout.yml: persist-credentials: false on checkout — these workflows auth to AWS via OIDC and never push to the repo, so the GITHUB_TOKEN doesn't need to linger in .git/config. Left for a repo-wide follow-up (not this PR, for consistency): SHA-pin actions across all privileged workflows + tighten the deployer OIDC trust from repo:*:* to :environment:dev.

@v5

…ew M1/M2) - Pin all third-party actions in the privileged deploy workflows to commit SHAs (actions/checkout, setup-node, setup-go, aws-actions/configure-aws-credentials) with version comments — these workflows hold OIDC→AWS access, so an action tag hijack is a supply-chain path. (M1) - runner-rollout.yml Resolve inputs: write the free-text ref/instance_id to GITHUB_OUTPUT via heredoc delimiters so a newline-laden value can't inject extra output keys (e.g. forge a stage=). (M2; M3 command-injection was fixed in fff6b76) Follow-up (separate, not this PR): SHA-pin the remaining ~34 @v4/@v5 uses repo-wide, and tighten the deployer OIDC trust from repo:*:* to :environment:dev.

Extends the existing one-click deploy + runner-rollout buttons to a second SST stage (prod) in the same account, without changing dev behavior: - sst.config.ts: REGION reads AWS_REGION (default ap-southeast-1); runner EC2 Name tag is now stage-scoped (boxlite-<stage>-runner-default); isProd / removal key off stage === 'prod' so prod resources are named boxlite-prod-* (matching the deploy role's PassRole grant) and get RDS deletion-protection + final snapshot. - deploy.yml / runner-rollout.yml: stage choice adds prod; a region input replaces the hardcoded ap-southeast-1; deploy gates on a per-stage GitHub Environment (environment: ${{ inputs.stage }}); runner-rollout S3 bucket is stage-scoped. - Deploy role boxlite-app-<stage>-deploy assumed via OIDC (boxlite-app-dev-deploy / boxlite-app-prod-deploy). Created by the account owner (admin, unbounded), trust sub repo:boxlite-ai/boxlite:environment:<stage>; boxlite-bounded-role-admin must allow PassRole / AddRoleToInstanceProfile on role/boxlite-prod-* (today dev/e2e only). - runner-update-binary.sh: tag lookup is stage-scoped with a dev-only legacy fallback and a multi-match guard. - scripts/seed-prod-from-dev.mjs (new): copies a stage's SSM env to another (default dev -> prod), domain-rewriting STACK_DOMAIN and leaving Auth0/SSH/runner-ip blank for manual seeding. Dry-run by default. dev flow is unchanged: region defaults to ap-southeast-1 and the dev GitHub Environment already exists. Reviewed for dev-safety, region-correctness, and GitHub Actions semantics.

Adds two backward-compatible knobs so a prod stage can use scheme B (api/ssh/proxy on boxlite.ai, dashboard host on app.boxlite.ai) and deploy to a different region than where its config lives — both default to today's dev behavior, so dev is byte-for-byte unchanged. - sst.config.ts: `servicesDomain = SERVICES_DOMAIN || stackDomain` now backs the serviceDomain() helper and the api/proxy/ssh hostnames + their env vars (BOXLITE_API_URL, PROXY_DOMAIN, PROXY_TEMPLATE_URL, SSH_GATEWAY_URL, DASHBOARD_BASE_API_URL, proxyDomain, the ssh Cloudflare CNAME). The dashboard host (Router domain, DASHBOARD_URL, OIDC end-session) stays on stackDomain. Adds a `dashboardPath = DASHBOARD_PATH || ''` passthrough (DASHBOARD_PATH env) for an upcoming path-based dashboard; '' keeps the dashboard at the root today. - sst-with-cloudflare.mjs: SSM is read from a fixed region (BOXLITE_CONFIG_REGION || ap-southeast-1), decoupled from AWS_REGION, so a stage deployed to us-west-2 reads its /boxlite/<stage>/* config + cloudflare creds from ap-southeast-1 (the deploy role reads cross-region; a region-locked SSO can still seed it). dev: SERVICES_DOMAIN/DASHBOARD_PATH/BOXLITE_CONFIG_REGION all unset -> identical to before. prod-B seeds SERVICES_DOMAIN=boxlite.ai (+ STACK_DOMAIN=app.boxlite.ai).

…eploy reader) Adversarial review of the domain/SSM-decouple change found the seed scripts (seed-deploy-env-ssm.mjs, seed-prod-from-dev.mjs) still read/write SSM from AWS_REGION while sst-with-cloudflare.mjs now reads from BOXLITE_CONFIG_REGION — so a stage seeded with AWS_REGION set to its deploy region would write config to the wrong region and the deploy (reading ap-southeast-1) would not find it. All three now key off BOXLITE_CONFIG_REGION || ap-southeast-1: config is a region-fixed store (ap-southeast-1) independent of the resource deploy region (AWS_REGION, still used by sst.config.ts for resources). Also refreshes the stale sst.config.ts comment that claimed the SSM helpers follow AWS_REGION.

…d-B) Lets the dashboard SPA serve under a sub-path (e.g. app.boxlite.ai/dashboard for prod scheme B) while leaving dev byte-for-byte unchanged. All knobs default to '' / '/' so the dev root flow is identical to before. The path lives in ONE place per layer and flows end-to-end: - vite.config.mts: `base = VITE_DASHBOARD_PATH ? <path>/ : '/'`. Vite rewrites every built asset URL (JS/CSS/imported images) to be path-prefixed. - index.html: favicon `./favicon.ico` (relative) so it resolves correctly under either mount point — Vite does NOT rewrite raw <link href="/..."> entries. - main.tsx: `<BrowserRouter basename={import.meta.env.BASE_URL}>` so client-side routes are relative to the mount. - ConfigProvider.tsx: `redirect_uri = origin + BASE_URL.replace(/\/$/, '')` so Auth0 callback matches the SPA's mount URL exactly (e.g. https://app.boxlite.ai/dashboard). - app.module.ts: ServeStaticModule.renderPath now reads DASHBOARD_PATH env (defaults to '/'); must match the Vite base the bundle was built with. - Dockerfile: `ARG DASHBOARD_PATH=` feeds VITE_DASHBOARD_PATH into the dashboard build step, so the bundle ships with the correct asset prefix baked in. - sst.config.ts: passes dashboardPath as Api image `args: { DASHBOARD_PATH }`, closing the loop from SSM (DASHBOARD_PATH) -> build -> serve. Verified locally: built dashboard twice with VITE_DASHBOARD_PATH=/dashboard and unset, served via python http-server, curl 200 on /dashboard/, /dashboard/index.html, /dashboard/assets/index-*.js, /dashboard/favicon.ico (prefix injected) and confirmed the default build keeps /assets/* (root unchanged). prod-B will deploy this with SSM /boxlite/prod/env/DASHBOARD_PATH=/dashboard.

$transform(aws.iam.InstanceProfile, ...) defaults args.name to ${$app.name}-${$app.stage}-${resourceName} for any InstanceProfile that didn't set an explicit name (matches the existing aws.iam.Role boundary transform's pattern). The boxlite-app-prod-deploy inline policy AllowManageInstanceProfiles only allows iam:CreateInstanceProfile on instance-profile/boxlite-prod-*. SST's VPC NAT helper auto-names its profile 'VpcNatInstanceProfile-<hash>' which falls outside that scope → deploy fails. With this transform it becomes 'boxlite-prod-VpcNatInstanceProfile', which matches. Explicit names (e.g. RunnerProfile with name: \`${app.name}-${app.stage}-runner\`) win because of `??=`. Dev unchanged: same naming pattern, dev's policy is broader.

The boxlite-<stage>-* InstanceProfile naming transform forced a rename of pre-existing auto-named profiles on stages that predate it (dev's VpcNatInstanceProfile-<hash>). The rename triggers RemoveRoleFromInstanceProfile on the OLD name, which a stage's boxlite-<stage>-*-scoped deploy role cannot perform -> iam:RemoveRoleFromInstanceProfile AccessDenied, failing the dev deploy. prod was a fresh deploy so its profiles are created under the scoped name from the start; gating the transform to prod leaves other stages on SST's default name, a no-op against their existing state, and keeps prod's scoped-role grant satisfied.

law-chain-hot requested a review from a team as a code owner June 22, 2026 14:24

chatgpt-codex-connector Bot reviewed Jun 22, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

law-chain-hot force-pushed the feat/prod-enable branch 3 times, most recently from 11d51d0 to a83420b Compare June 22, 2026 15:44

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

law-chain-hot changed the title ~~feat(deploy): enable production stage via the lite deploy flow~~ feat(deploy): enable prod stage via the lite deploy flow Jun 23, 2026

law-chain-hot marked this pull request as draft June 23, 2026 06:31

law-chain-hot had a problem deploying to prod June 23, 2026 06:50 — with GitHub Actions Failure

law-chain-hot temporarily deployed to prod June 23, 2026 06:58 — with GitHub Actions Inactive

law-chain-hot had a problem deploying to prod June 23, 2026 07:05 — with GitHub Actions Error

law-chain-hot had a problem deploying to prod June 23, 2026 07:10 — with GitHub Actions Failure

law-chain-hot had a problem deploying to prod June 23, 2026 07:37 — with GitHub Actions Failure

law-chain-hot had a problem deploying to prod June 23, 2026 07:40 — with GitHub Actions Failure

law-chain-hot had a problem deploying to prod June 23, 2026 07:59 — with GitHub Actions Failure

law-chain-hot had a problem deploying to prod June 23, 2026 08:34 — with GitHub Actions Failure

law-chain-hot temporarily deployed to prod June 23, 2026 08:43 — with GitHub Actions Inactive

law-chain-hot temporarily deployed to prod June 23, 2026 09:35 — with GitHub Actions Inactive

law-chain-hot deployed to prod June 23, 2026 13:00 — with GitHub Actions Active

law-chain-hot added 9 commits June 24, 2026 20:09

fix(deploy): standardize prod stage naming

29dc6ab

law-chain-hot force-pushed the feat/prod-enable branch from d77c3dd to 6f66825 Compare June 24, 2026 12:44

law-chain-hot had a problem deploying to dev June 24, 2026 12:45 — with GitHub Actions Failure

law-chain-hot deployed to dev June 24, 2026 12:53 — with GitHub Actions Active

		if (!key \|\| process.env[key]) continue // already set (e.g. from .env) wins
		process.env[key] = p.Value


		import { execFileSync } from 'node:child_process'

		const REGION = process.env.AWS_REGION \|\| 'ap-southeast-1'

Conversation

law-chain-hot commented Jun 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Changes

Prerequisites before a prod deploy can run (account owner)

Safety

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

law-chain-hot commented Jun 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading