feat(deploy): enable prod stage via the lite deploy flow#836
feat(deploy): enable prod stage via the lite deploy flow#836law-chain-hot wants to merge 10 commits into
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds two manual GitHub Actions workflows ( ChangesDeploy & Runner Rollout Pipeline
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 08b4a28041
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| rollout: | ||
| name: Roll out runner | ||
| needs: config | ||
| runs-on: ubuntu-latest |
There was a problem hiding this comment.
Add the rollout job environment before assuming AWS
The rollout job assumes boxlite-app-<stage>-deploy, whose documented OIDC trust is repo:boxlite-ai/boxlite:environment:<stage>, but this job never declares a GitHub environment the way deploy.yml does. GitHub's OIDC sub only includes environment:<name> when the job references that environment; otherwise it uses the ref/pull_request subject, so the Assume AWS deploy role step will be rejected for both dev and production once the roles are created as described, and relaxing that trust would also remove the intended environment approval gate.
Useful? React with 👍 / 👎.
| if (!key || process.env[key]) continue // already set (e.g. from .env) wins | ||
| process.env[key] = p.Value |
There was a problem hiding this comment.
Load local dotenv before filling from SSM
When this wrapper runs from a laptop with AWS credentials, these lines populate process.env from SSM before spawning sst; apps/infra/sst.config.ts loads .env later with dotenv's default non-overriding behavior. As a result, local .env values that are not already shell-exported, such as STACK_DOMAIN or OIDC_*, are ignored and replaced by the stage's SSM values, contrary to the comment that laptops keep using .env unchanged.
Useful? React with 👍 / 👎.
|
|
||
| import { execFileSync } from 'node:child_process' | ||
|
|
||
| const REGION = process.env.AWS_REGION || 'ap-southeast-1' |
There was a problem hiding this comment.
Separate source and target SSM regions
Using one REGION for both readSourceEnv and putParam breaks the advertised flow when production is deployed outside dev's region. With AWS_REGION=us-west-2, the script reads /boxlite/dev/env/* from us-west-2 even though dev lives in ap-southeast-1; with the default region, it writes /boxlite/production/env/* to ap-southeast-1 where a us-west-2 production deploy will not load it.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
.github/workflows/deploy.yml (1)
58-59: 🧹 Nitpick | 🔵 Trivial | 💤 Low valueConsider adding a job timeout.
SST deployments can take significant time, but unbounded jobs risk hanging indefinitely on infrastructure issues (e.g., stuck CloudFormation stacks). A reasonable timeout provides a safety net.
deploy: runs-on: ubuntu-latest + timeout-minutes: 45 # GitHub Environment per stage — gates the deploy behind that environment's🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/deploy.yml around lines 58 - 59, The deploy job in the GitHub Actions workflow currently has no timeout protection, which means it can hang indefinitely if infrastructure issues occur. Add a timeout-minutes field to the deploy job configuration to set a reasonable maximum execution time for the job, preventing it from consuming resources unnecessarily during infrastructure failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/runner-rollout.yml:
- Around line 162-165: The "Rewrite go.work for minimal modules" step hardcodes
the Go version as 1.25.4 in the printf command, which conflicts with the Go
version defined in config.yml (1.24) and the version already retrieved via
needs.config.outputs.go-version at the workflow setup. Replace the hardcoded go
1.25.4 string in the printf command with a reference to the
needs.config.outputs.go-version output variable to ensure consistency across the
workflow and align with the configured version.
In `@apps/infra/scripts/seed-deploy-env-ssm.mjs`:
- Around line 69-89: The script catches errors from the put-parameter AWS CLI
call but continues executing and completes successfully regardless of whether
any writes failed. To fix this, add a flag (such as an error tracking variable)
that is set to true whenever an error is caught in the catch block at line 84.
After the final console.log statement at line 89, check this flag and call
process.exit with a non-zero code (e.g., 1) if any errors were encountered
during the loop. This ensures the script signals failure to its caller when any
SSM parameter write fails, preventing partially seeded environments from
appearing successful.
In `@apps/infra/scripts/seed-prod-from-dev.mjs`:
- Around line 103-109: The script should fail fast when the source SSM path
contains zero keys. After the readSourceEnv function call and the console.log
statement that outputs the source size, add a validation check to verify that
source.size is greater than 0. If source.size equals 0, throw an error with a
descriptive message indicating that the source environment has no keys and the
script cannot proceed, rather than allowing the script to continue with an empty
configuration.
In `@apps/infra/scripts/sst-with-cloudflare.mjs`:
- Around line 99-103: The truthy check on process.env[key] in the condition on
line 101 causes empty-string environment variables to be overwritten by SSM
values, breaking the intended precedence where already-set values should win.
Replace the truthy check `process.env[key]` with an existence check using the
`in` operator (e.g., `key in process.env`) to properly detect whether a key is
already set in the environment, regardless of whether its value is an empty
string or any other falsy value.
- Around line 88-90: The catch block in the SSM environment load section only
logs a warning when the error code is ENOENT (aws CLI not found), but silently
returns for all other errors including authentication, region, and path
configuration issues. Modify the error handling to add an else clause after the
ENOENT check that logs a warning with the actual error details for non-ENOENT
failures, so that configuration problems are visible and debuggable instead of
being silently swallowed.
---
Nitpick comments:
In @.github/workflows/deploy.yml:
- Around line 58-59: The deploy job in the GitHub Actions workflow currently has
no timeout protection, which means it can hang indefinitely if infrastructure
issues occur. Add a timeout-minutes field to the deploy job configuration to set
a reasonable maximum execution time for the job, preventing it from consuming
resources unnecessarily during infrastructure failures.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: fc77cd02-1035-4d1f-96c9-abbcd8acce01
📒 Files selected for processing (7)
.github/workflows/deploy.yml.github/workflows/runner-rollout.ymlapps/infra/scripts/seed-deploy-env-ssm.mjsapps/infra/scripts/seed-prod-from-dev.mjsapps/infra/scripts/sst-with-cloudflare.mjsapps/infra/sst.config.tsscripts/deploy/runner-update-binary.sh
| - name: Rewrite go.work for minimal modules | ||
| if: ${{ steps.in.outputs.source == 'dev-build' }} | ||
| run: | | ||
| printf 'go 1.25.4\n\nuse (\n\t./runner\n\t./daemon\n\t./common-go\n\t./api-client-go\n\t./libs/computer-use\n\t../sdks/go\n)\n' > apps/go.work |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check what go-version config.yml exports
cat .github/workflows/config.yml | grep -A5 -B5 'go-version'Repository: boxlite-ai/boxlite
Length of output: 919
Hardcoded Go version conflicts with config.yml.
Line 165 hardcodes go 1.25.4 in the generated go.work, but config.yml defines go-version: '1.24' and the Go setup at line 131 uses needs.config.outputs.go-version. This version divergence could cause build failures or unexpected behavior.
Use the configured Go version instead:
Suggested fix
- name: Rewrite go.work for minimal modules
if: ${{ steps.in.outputs.source == 'dev-build' }}
+ env:
+ GO_VERSION: ${{ needs.config.outputs.go-version }}
run: |
- printf 'go 1.25.4\n\nuse (\n\t./runner\n\t./daemon\n\t./common-go\n\t./api-client-go\n\t./libs/computer-use\n\t../sdks/go\n)\n' > apps/go.work
+ printf 'go %s\n\nuse (\n\t./runner\n\t./daemon\n\t./common-go\n\t./api-client-go\n\t./libs/computer-use\n\t../sdks/go\n)\n' "$GO_VERSION" > apps/go.work🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/runner-rollout.yml around lines 162 - 165, The "Rewrite
go.work for minimal modules" step hardcodes the Go version as 1.25.4 in the
printf command, which conflicts with the Go version defined in config.yml (1.24)
and the version already retrieved via needs.config.outputs.go-version at the
workflow setup. Replace the hardcoded go 1.25.4 string in the printf command
with a reference to the needs.config.outputs.go-version output variable to
ensure consistency across the workflow and align with the configured version.
| let ok = 0 | ||
| for (const [key, val] of pairs) { | ||
| const name = `/boxlite/${stage}/env/${key}` | ||
| if (DRY) { | ||
| console.log(` would put ${name}`) | ||
| continue | ||
| } | ||
| try { | ||
| execFileSync( | ||
| 'aws', | ||
| ['ssm', 'put-parameter', '--region', REGION, '--name', name, '--type', 'SecureString', '--overwrite', '--value', val], | ||
| { stdio: ['ignore', 'ignore', 'pipe'] }, // never echo the value or the version output | ||
| ) | ||
| console.log(` ✓ ${key}`) | ||
| ok++ | ||
| } catch (err) { | ||
| const msg = (err.stderr ? err.stderr.toString() : err.message).split('\n')[0] | ||
| console.error(` ✗ ${key}: ${msg}`) | ||
| } | ||
| } | ||
| console.log(DRY ? 'seed-deploy-env-ssm: dry-run, nothing written' : `seed-deploy-env-ssm: wrote ${ok}/${pairs.length}`) |
There was a problem hiding this comment.
Fail the script when any SSM write fails.
Line 84 catches put-parameter errors, but Line 89 still completes successfully. That can leave /boxlite/<stage>/env/* partially seeded without signaling failure.
Suggested fix
let ok = 0
+let failures = 0
for (const [key, val] of pairs) {
const name = `/boxlite/${stage}/env/${key}`
@@
} catch (err) {
const msg = (err.stderr ? err.stderr.toString() : err.message).split('\n')[0]
console.error(` ✗ ${key}: ${msg}`)
+ failures++
}
}
console.log(DRY ? 'seed-deploy-env-ssm: dry-run, nothing written' : `seed-deploy-env-ssm: wrote ${ok}/${pairs.length}`)
+if (!DRY && failures > 0) process.exit(1)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let ok = 0 | |
| for (const [key, val] of pairs) { | |
| const name = `/boxlite/${stage}/env/${key}` | |
| if (DRY) { | |
| console.log(` would put ${name}`) | |
| continue | |
| } | |
| try { | |
| execFileSync( | |
| 'aws', | |
| ['ssm', 'put-parameter', '--region', REGION, '--name', name, '--type', 'SecureString', '--overwrite', '--value', val], | |
| { stdio: ['ignore', 'ignore', 'pipe'] }, // never echo the value or the version output | |
| ) | |
| console.log(` ✓ ${key}`) | |
| ok++ | |
| } catch (err) { | |
| const msg = (err.stderr ? err.stderr.toString() : err.message).split('\n')[0] | |
| console.error(` ✗ ${key}: ${msg}`) | |
| } | |
| } | |
| console.log(DRY ? 'seed-deploy-env-ssm: dry-run, nothing written' : `seed-deploy-env-ssm: wrote ${ok}/${pairs.length}`) | |
| let ok = 0 | |
| let failures = 0 | |
| for (const [key, val] of pairs) { | |
| const name = `/boxlite/${stage}/env/${key}` | |
| if (DRY) { | |
| console.log(` would put ${name}`) | |
| continue | |
| } | |
| try { | |
| execFileSync( | |
| 'aws', | |
| ['ssm', 'put-parameter', '--region', REGION, '--name', name, '--type', 'SecureString', '--overwrite', '--value', val], | |
| { stdio: ['ignore', 'ignore', 'pipe'] }, // never echo the value or the version output | |
| ) | |
| console.log(` ✓ ${key}`) | |
| ok++ | |
| } catch (err) { | |
| const msg = (err.stderr ? err.stderr.toString() : err.message).split('\n')[0] | |
| console.error(` ✗ ${key}: ${msg}`) | |
| failures++ | |
| } | |
| } | |
| console.log(DRY ? 'seed-deploy-env-ssm: dry-run, nothing written' : `seed-deploy-env-ssm: wrote ${ok}/${pairs.length}`) | |
| if (!DRY && failures > 0) process.exit(1) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/infra/scripts/seed-deploy-env-ssm.mjs` around lines 69 - 89, The script
catches errors from the put-parameter AWS CLI call but continues executing and
completes successfully regardless of whether any writes failed. To fix this, add
a flag (such as an error tracking variable) that is set to true whenever an
error is caught in the catch block at line 84. After the final console.log
statement at line 89, check this flag and call process.exit with a non-zero code
(e.g., 1) if any errors were encountered during the loop. This ensures the
script signals failure to its caller when any SSM parameter write fails,
preventing partially seeded environments from appearing successful.
| const source = readSourceEnv(SOURCE_STAGE) | ||
| console.log( | ||
| `seed-prod-from-dev: ${SOURCE_STAGE} -> ${TARGET_STAGE} | region=${REGION} | ` + | ||
| `${DEV_DOMAIN} -> ${PROD_DOMAIN} | ${APPLY ? 'APPLY (writing)' : 'DRY-RUN (nothing written)'}`, | ||
| ) | ||
| console.log(` source has ${source.size} key(s) under /boxlite/${SOURCE_STAGE}/env/\n`) | ||
|
|
There was a problem hiding this comment.
Fail fast when the source SSM path is empty.
If Line 103 loads zero keys, the script currently continues and can report a misleading “done” result. For stage bootstrapping, empty source should be treated as an error condition.
Suggested fix
const source = readSourceEnv(SOURCE_STAGE)
+if (source.size === 0) {
+ console.error(
+ `seed-prod-from-dev: no keys found under /boxlite/${SOURCE_STAGE}/env/ in ${REGION}; ` +
+ `check source stage/region/permissions before continuing`,
+ )
+ process.exit(1)
+}
console.log(
`seed-prod-from-dev: ${SOURCE_STAGE} -> ${TARGET_STAGE} | region=${REGION} | ` +📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const source = readSourceEnv(SOURCE_STAGE) | |
| console.log( | |
| `seed-prod-from-dev: ${SOURCE_STAGE} -> ${TARGET_STAGE} | region=${REGION} | ` + | |
| `${DEV_DOMAIN} -> ${PROD_DOMAIN} | ${APPLY ? 'APPLY (writing)' : 'DRY-RUN (nothing written)'}`, | |
| ) | |
| console.log(` source has ${source.size} key(s) under /boxlite/${SOURCE_STAGE}/env/\n`) | |
| const source = readSourceEnv(SOURCE_STAGE) | |
| if (source.size === 0) { | |
| console.error( | |
| `seed-prod-from-dev: no keys found under /boxlite/${SOURCE_STAGE}/env/ in ${REGION}; ` + | |
| `check source stage/region/permissions before continuing`, | |
| ) | |
| process.exit(1) | |
| } | |
| console.log( | |
| `seed-prod-from-dev: ${SOURCE_STAGE} -> ${TARGET_STAGE} | region=${REGION} | ` + | |
| `${DEV_DOMAIN} -> ${PROD_DOMAIN} | ${APPLY ? 'APPLY (writing)' : 'DRY-RUN (nothing written)'}`, | |
| ) | |
| console.log(` source has ${source.size} key(s) under /boxlite/${SOURCE_STAGE}/env/\n`) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/infra/scripts/seed-prod-from-dev.mjs` around lines 103 - 109, The script
should fail fast when the source SSM path contains zero keys. After the
readSourceEnv function call and the console.log statement that outputs the
source size, add a validation check to verify that source.size is greater than
0. If source.size equals 0, throw an error with a descriptive message indicating
that the source environment has no keys and the script cannot proceed, rather
than allowing the script to continue with an empty configuration.
| } catch (err) { | ||
| if (err.code === 'ENOENT') console.warn('sst-with-cloudflare: `aws` CLI not found; skipping SSM env load') | ||
| return |
There was a problem hiding this comment.
Don’t swallow non-ENOENT SSM load failures silently.
Line 88-90 returns without warning for auth/region/path errors. That hides the actual failure mode and makes deploy-time config issues hard to diagnose.
Suggested fix
} catch (err) {
- if (err.code === 'ENOENT') console.warn('sst-with-cloudflare: `aws` CLI not found; skipping SSM env load')
+ if (err.code === 'ENOENT') {
+ console.warn('sst-with-cloudflare: `aws` CLI not found; skipping SSM env load')
+ } else {
+ const msg = (err.stderr ? err.stderr.toString() : err.message).split('\n')[0]
+ console.warn(`sst-with-cloudflare: failed loading SSM env from ${path}*: ${msg}`)
+ }
return
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } catch (err) { | |
| if (err.code === 'ENOENT') console.warn('sst-with-cloudflare: `aws` CLI not found; skipping SSM env load') | |
| return | |
| } catch (err) { | |
| if (err.code === 'ENOENT') { | |
| console.warn('sst-with-cloudflare: `aws` CLI not found; skipping SSM env load') | |
| } else { | |
| const msg = (err.stderr ? err.stderr.toString() : err.message).split('\n')[0] | |
| console.warn(`sst-with-cloudflare: failed loading SSM env from ${path}*: ${msg}`) | |
| } | |
| return | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/infra/scripts/sst-with-cloudflare.mjs` around lines 88 - 90, The catch
block in the SSM environment load section only logs a warning when the error
code is ENOENT (aws CLI not found), but silently returns for all other errors
including authentication, region, and path configuration issues. Modify the
error handling to add an else clause after the ENOENT check that logs a warning
with the actual error details for non-ENOENT failures, so that configuration
problems are visible and debuggable instead of being silently swallowed.
| for (const p of params || []) { | ||
| const key = p.Name.slice(path.length) // segment after the prefix = env var name | ||
| if (!key || process.env[key]) continue // already set (e.g. from .env) wins | ||
| process.env[key] = p.Value | ||
| loaded++ |
There was a problem hiding this comment.
Fix env precedence check for empty-string values.
Line 101 uses a truthy check, so '' gets overwritten by SSM. That breaks the “already set wins” contract when empty-string is intentionally configured.
Suggested fix
- if (!key || process.env[key]) continue // already set (e.g. from .env) wins
+ if (!key || process.env[key] !== undefined) continue // already set (including '') wins🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/infra/scripts/sst-with-cloudflare.mjs` around lines 99 - 103, The truthy
check on process.env[key] in the condition on line 101 causes empty-string
environment variables to be overwritten by SSM values, breaking the intended
precedence where already-set values should win. Replace the truthy check
`process.env[key]` with an existence check using the `in` operator (e.g., `key
in process.env`) to properly detect whether a key is already set in the
environment, regardless of whether its value is an empty string or any other
falsy value.
11d51d0 to
a83420b
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
apps/infra/sst.config.ts (1)
122-124: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winHardcoded AWS account ID limits portability.
The permission boundary ARN contains a hardcoded account ID (
064212132677). If this infrastructure is ever deployed to a different AWS account, this will fail. Consider deriving the account ID dynamically.♻️ Suggested improvement
$transform(aws.iam.Role, (args) => { - args.permissionsBoundary ??= 'arn:aws:iam::064212132677:policy/boxlite-role-boundary' + args.permissionsBoundary ??= $interpolate`arn:aws:iam::${aws.getCallerIdentityOutput().accountId}:policy/boxlite-role-boundary` })🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/infra/sst.config.ts` around lines 122 - 124, The permissionsBoundary ARN in the $transform function applied to aws.iam.Role contains a hardcoded AWS account ID (064212132677), which breaks portability across different AWS accounts. Replace the hardcoded account ID in the ARN string with a dynamic reference to the current AWS account ID. You can obtain the current account ID from the aws.getCallerIdentity() function or by using the aws.getAccountId() helper function, and then construct the ARN string dynamically by interpolating the retrieved account ID into the arn:aws:iam::ACCOUNT_ID:policy/boxlite-role-boundary format.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@apps/infra/sst.config.ts`:
- Around line 122-124: The permissionsBoundary ARN in the $transform function
applied to aws.iam.Role contains a hardcoded AWS account ID (064212132677),
which breaks portability across different AWS accounts. Replace the hardcoded
account ID in the ARN string with a dynamic reference to the current AWS account
ID. You can obtain the current account ID from the aws.getCallerIdentity()
function or by using the aws.getAccountId() helper function, and then construct
the ARN string dynamically by interpolating the retrieved account ID into the
arn:aws:iam::ACCOUNT_ID:policy/boxlite-role-boundary format.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: b1900312-d2c2-44eb-a383-a7bf468dd8db
📒 Files selected for processing (6)
.github/workflows/deploy.yml.github/workflows/runner-rollout.ymlapps/infra/README.mdapps/infra/scripts/seed-prod-from-dev.mjsapps/infra/sst.config.tsscripts/deploy/runner-update-binary.sh
✅ Files skipped from review due to trivial changes (1)
- apps/infra/README.md
🚧 Files skipped from review as they are similar to previous changes (4)
- apps/infra/scripts/seed-prod-from-dev.mjs
- scripts/deploy/runner-update-binary.sh
- .github/workflows/deploy.yml
- .github/workflows/runner-rollout.yml
GitHub Actions deploy buttons for dev, OIDC-only (no static keys): - deploy.yml: Cloud deploy (workflow_dispatch diff|deploy, environment: dev) — assumes boxlite-github-deployer-<stage>, runs sst diff/deploy with SSM-injected env. - runner-rollout.yml: swap the boxlite-runner binary in place via SSM (release or dev-build from S3) with checksum verify + auto-rollback. - runner-update-binary.sh: accept an S3 dev-channel source (separate sha256 URL). - seed-deploy-env-ssm.mjs / sst-with-cloudflare.mjs: inject stage env + Cloudflare creds from SSM so CI matches a laptop .env. - sst.config.ts: $transform stamps boxlite-role-boundary on every IAM role the app creates (satisfies the bounded deploy role's conditional create/manage grant); RunnerRole/Profile get explicit boxlite-dev-runner names so they fall under the deploy role's boxlite-* IAM scope. Validated 2026-06-22: deploy --stage dev completes end-to-end; dev healthy (8 services running, 5 ALBs active, api/dashboard 200, box create verified).
…sist-credentials)
- runner-rollout.yml: pass workflow_dispatch inputs (ref/instance_id, free-text) via
env vars instead of ${{ }} interpolated into run: scripts, so a crafted input is a
shell value rather than executable script (CodeRabbit/zizmor template-injection).
- deploy.yml + runner-rollout.yml: persist-credentials: false on checkout — these
workflows auth to AWS via OIDC and never push to the repo, so the GITHUB_TOKEN
doesn't need to linger in .git/config.
Left for a repo-wide follow-up (not this PR, for consistency): SHA-pin actions across
all privileged workflows + tighten the deployer OIDC trust from repo:*:* to
:environment:dev.
…ew M1/M2) - Pin all third-party actions in the privileged deploy workflows to commit SHAs (actions/checkout, setup-node, setup-go, aws-actions/configure-aws-credentials) with version comments — these workflows hold OIDC→AWS access, so an action tag hijack is a supply-chain path. (M1) - runner-rollout.yml Resolve inputs: write the free-text ref/instance_id to GITHUB_OUTPUT via heredoc delimiters so a newline-laden value can't inject extra output keys (e.g. forge a stage=). (M2; M3 command-injection was fixed in fff6b76) Follow-up (separate, not this PR): SHA-pin the remaining ~34 @v4/@v5 uses repo-wide, and tighten the deployer OIDC trust from repo:*:* to :environment:dev.
Extends the existing one-click deploy + runner-rollout buttons to a second
SST stage (prod) in the same account, without changing dev behavior:
- sst.config.ts: REGION reads AWS_REGION (default ap-southeast-1); runner EC2
Name tag is now stage-scoped (boxlite-<stage>-runner-default); isProd / removal
key off stage === 'prod' so prod resources are named boxlite-prod-* (matching the
deploy role's PassRole grant) and get RDS deletion-protection + final snapshot.
- deploy.yml / runner-rollout.yml: stage choice adds prod; a region input replaces
the hardcoded ap-southeast-1; deploy gates on a per-stage GitHub Environment
(environment: ${{ inputs.stage }}); runner-rollout S3 bucket is stage-scoped.
- Deploy role boxlite-app-<stage>-deploy assumed via OIDC (boxlite-app-dev-deploy /
boxlite-app-prod-deploy). Created by the account owner (admin, unbounded), trust
sub repo:boxlite-ai/boxlite:environment:<stage>; boxlite-bounded-role-admin must
allow PassRole / AddRoleToInstanceProfile on role/boxlite-prod-* (today dev/e2e only).
- runner-update-binary.sh: tag lookup is stage-scoped with a dev-only legacy
fallback and a multi-match guard.
- scripts/seed-prod-from-dev.mjs (new): copies a stage's SSM env to another (default
dev -> prod), domain-rewriting STACK_DOMAIN and leaving Auth0/SSH/runner-ip blank
for manual seeding. Dry-run by default.
dev flow is unchanged: region defaults to ap-southeast-1 and the dev GitHub
Environment already exists. Reviewed for dev-safety, region-correctness, and
GitHub Actions semantics.
Adds two backward-compatible knobs so a prod stage can use scheme B (api/ssh/proxy on boxlite.ai, dashboard host on app.boxlite.ai) and deploy to a different region than where its config lives — both default to today's dev behavior, so dev is byte-for-byte unchanged. - sst.config.ts: `servicesDomain = SERVICES_DOMAIN || stackDomain` now backs the serviceDomain() helper and the api/proxy/ssh hostnames + their env vars (BOXLITE_API_URL, PROXY_DOMAIN, PROXY_TEMPLATE_URL, SSH_GATEWAY_URL, DASHBOARD_BASE_API_URL, proxyDomain, the ssh Cloudflare CNAME). The dashboard host (Router domain, DASHBOARD_URL, OIDC end-session) stays on stackDomain. Adds a `dashboardPath = DASHBOARD_PATH || ''` passthrough (DASHBOARD_PATH env) for an upcoming path-based dashboard; '' keeps the dashboard at the root today. - sst-with-cloudflare.mjs: SSM is read from a fixed region (BOXLITE_CONFIG_REGION || ap-southeast-1), decoupled from AWS_REGION, so a stage deployed to us-west-2 reads its /boxlite/<stage>/* config + cloudflare creds from ap-southeast-1 (the deploy role reads cross-region; a region-locked SSO can still seed it). dev: SERVICES_DOMAIN/DASHBOARD_PATH/BOXLITE_CONFIG_REGION all unset -> identical to before. prod-B seeds SERVICES_DOMAIN=boxlite.ai (+ STACK_DOMAIN=app.boxlite.ai).
…eploy reader) Adversarial review of the domain/SSM-decouple change found the seed scripts (seed-deploy-env-ssm.mjs, seed-prod-from-dev.mjs) still read/write SSM from AWS_REGION while sst-with-cloudflare.mjs now reads from BOXLITE_CONFIG_REGION — so a stage seeded with AWS_REGION set to its deploy region would write config to the wrong region and the deploy (reading ap-southeast-1) would not find it. All three now key off BOXLITE_CONFIG_REGION || ap-southeast-1: config is a region-fixed store (ap-southeast-1) independent of the resource deploy region (AWS_REGION, still used by sst.config.ts for resources). Also refreshes the stale sst.config.ts comment that claimed the SSM helpers follow AWS_REGION.
…d-B)
Lets the dashboard SPA serve under a sub-path (e.g. app.boxlite.ai/dashboard for
prod scheme B) while leaving dev byte-for-byte unchanged. All knobs default to
'' / '/' so the dev root flow is identical to before.
The path lives in ONE place per layer and flows end-to-end:
- vite.config.mts: `base = VITE_DASHBOARD_PATH ? <path>/ : '/'`. Vite rewrites every
built asset URL (JS/CSS/imported images) to be path-prefixed.
- index.html: favicon `./favicon.ico` (relative) so it resolves correctly under
either mount point — Vite does NOT rewrite raw <link href="/..."> entries.
- main.tsx: `<BrowserRouter basename={import.meta.env.BASE_URL}>` so client-side
routes are relative to the mount.
- ConfigProvider.tsx: `redirect_uri = origin + BASE_URL.replace(/\/$/, '')` so
Auth0 callback matches the SPA's mount URL exactly (e.g. https://app.boxlite.ai/dashboard).
- app.module.ts: ServeStaticModule.renderPath now reads DASHBOARD_PATH env (defaults
to '/'); must match the Vite base the bundle was built with.
- Dockerfile: `ARG DASHBOARD_PATH=` feeds VITE_DASHBOARD_PATH into the dashboard
build step, so the bundle ships with the correct asset prefix baked in.
- sst.config.ts: passes dashboardPath as Api image `args: { DASHBOARD_PATH }`,
closing the loop from SSM (DASHBOARD_PATH) -> build -> serve.
Verified locally: built dashboard twice with VITE_DASHBOARD_PATH=/dashboard and
unset, served via python http-server, curl 200 on /dashboard/, /dashboard/index.html,
/dashboard/assets/index-*.js, /dashboard/favicon.ico (prefix injected) and confirmed
the default build keeps /assets/* (root unchanged).
prod-B will deploy this with SSM /boxlite/prod/env/DASHBOARD_PATH=/dashboard.
$transform(aws.iam.InstanceProfile, ...) defaults args.name to
${$app.name}-${$app.stage}-${resourceName} for any InstanceProfile that didn't
set an explicit name (matches the existing aws.iam.Role boundary transform's pattern).
The boxlite-app-prod-deploy inline policy AllowManageInstanceProfiles only allows
iam:CreateInstanceProfile on instance-profile/boxlite-prod-*. SST's VPC NAT helper
auto-names its profile 'VpcNatInstanceProfile-<hash>' which falls outside that scope
→ deploy fails. With this transform it becomes 'boxlite-prod-VpcNatInstanceProfile',
which matches.
Explicit names (e.g. RunnerProfile with name: \`${app.name}-${app.stage}-runner\`)
win because of `??=`. Dev unchanged: same naming pattern, dev's policy is broader.
d77c3dd to
6f66825
Compare
The boxlite-<stage>-* InstanceProfile naming transform forced a rename of pre-existing auto-named profiles on stages that predate it (dev's VpcNatInstanceProfile-<hash>). The rename triggers RemoveRoleFromInstanceProfile on the OLD name, which a stage's boxlite-<stage>-*-scoped deploy role cannot perform -> iam:RemoveRoleFromInstanceProfile AccessDenied, failing the dev deploy. prod was a fresh deploy so its profiles are created under the scoped name from the start; gating the transform to prod leaves other stages on SST's default name, a no-op against their existing state, and keeps prod's scoped-role grant satisfied.
What
Enables a second SST stage (prod) through the existing one-click Deploy + Runner Rollout buttons, without changing the dev flow. This branch is built on top of the dev-button work (#834) and supersedes it — it contains those commits plus the prod-enablement, so #834 does not need to be merged separately.
Changes
sst.config.ts—REGIONreadsAWS_REGION(defaultap-southeast-1); runner EC2Nametag is stage-scoped (boxlite-<stage>-runner-default) so dev/prod runners stay distinguishable.deploy.yml/runner-rollout.yml— stage choice addsprod; newregioninput replaces the hardcoded region; deploy gates on a per-stage GitHub Environment (environment: ${{ inputs.stage }}); runner-rolloutS3_BUCKETis stage-scoped.boxlite-github-deployer-<stage>→boxlite-app-<stage>-deploy(assumed via OIDC).runner-update-binary.sh— stage-scoped tag lookup with a dev-only legacy fallback + a multi-match guard.scripts/seed-prod-from-dev.mjs(new) — copies a stage's SSM env to another, domain-rewritingSTACK_DOMAIN, leaving Auth0 / SSH / runner-ip blank for manual seeding. Dry-run by default.Prerequisites before a prod deploy can run (account owner)
boxlite-app-dev-deployandboxlite-app-prod-deploy(admin, unbounded; OIDC trust subrepo:boxlite-ai/boxlite:environment:<stage>). The existing dev role is renamed toboxlite-app-dev-deploy.boxlite-bounded-role-adminsoPassRole+AddRoleToInstanceProfilecoverrole/boxlite-prod-*(today they list onlyboxlite-dev-*/boxlite-e2e-*).prodGitHub Environment./boxlite/prod/{cloudflare-*, env/*}(env viaseed-prod-from-dev.mjs; Auth0 values filled once the prod Auth0 app exists).Safety
dev flow is unchanged (region defaults to
ap-southeast-1, thedevGitHub Environment already exists). Reviewed for dev-safety, region-correctness, and GitHub Actions semantics; verified locally (YAML parse,node --check,bash -n, seed dry-run).Summary by CodeRabbit
sst, and updated teardown guidance.