feat(deploy): one-click Cloud + Runner deploy buttons (lite MVP)#834
feat(deploy): one-click Cloud + Runner deploy buttons (lite MVP)#834law-chain-hot wants to merge 25 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds GitHub Actions workflows for SST infra deployment ( ChangesDeployment Automation Infrastructure
Sequence Diagram(s)sequenceDiagram
participant Developer
participant GHA as GitHub Actions
participant S3 as S3 Bucket
participant OIDC as AWS IAM/OIDC
participant SSM as AWS SSM
participant EC2 as Runner EC2
rect rgba(100, 149, 237, 0.5)
Note over Developer,EC2: dev-build path
Developer->>GHA: Trigger Runner Rollout (source=dev-build)
GHA->>GHA: make dist:runner
GHA->>GHA: Build daemon, computer-use, boxlite-runner binaries
GHA->>GHA: Package tarball + sha256
GHA->>OIDC: Assume boxlite-github-deployer-dev via OIDC
OIDC-->>GHA: Temporary credentials
GHA->>S3: Upload tarball + sha256
GHA->>S3: Generate presigned URLs (30 min TTL)
S3-->>GHA: BOXLITE_RUNNER_TARBALL_URL, BOXLITE_RUNNER_SHA256_URL
GHA->>GHA: npm run runner:rollout (with presigned URLs)
GHA->>SSM: aws ssm send-command: AWS-RunShellScript
end
rect rgba(144, 238, 144, 0.5)
Note over Developer,SSM: release path
Developer->>GHA: Trigger Runner Rollout (source=release, ref=v1.2.3)
GHA->>OIDC: Assume boxlite-github-deployer-<stage> via OIDC
OIDC-->>GHA: Temporary credentials
GHA->>GHA: npm run runner:rollout --ref v1.2.3
GHA->>SSM: aws ssm send-command: AWS-RunShellScript
end
SSM->>EC2: Execute remote bash: download + verify + install + rollback on fail
EC2->>EC2: Download tarball (presigned or GitHub release)
EC2->>EC2: Verify checksum if available
EC2->>EC2: Extract binary, backup existing, systemd restart
EC2-->>SSM: boxlite-runner --version (stdout) or rollback + logs (stderr)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bebd8ecbfe
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| rollout: | ||
| name: Roll out runner | ||
| needs: config | ||
| runs-on: ubuntu-latest |
There was a problem hiding this comment.
Put runner rollouts behind the dev environment
For the dev rollout path, this job assumes the same boxlite-github-deployer-dev role and restarts the live runner, but unlike .github/workflows/deploy.yml:48 it never enters the dev environment. In installations where that environment enforces reviewers or the AWS OIDC trust is scoped to environment:dev, this rollout either bypasses the approval gate or fails to assume the role; add environment: dev here (or a stage-mapped environment when more stages exist) before configuring AWS credentials.
Useful? React with 👍 / 👎.
| } | ||
| } | ||
|
|
||
| loadEnvFromSsm(stage) |
There was a problem hiding this comment.
Load local .env before SSM env injection
When npm run deploy is run locally with AWS access and a .env override that differs from /boxlite/<stage>/env, this call fills process.env from SSM before sst.config.ts later calls dotenv.config(), and dotenv does not override existing variables by default. That makes SSM values win over the laptop .env despite the comments saying local .env is unchanged; load dotenv in this wrapper first or only SSM-fill keys absent from both the shell and .env.
Useful? React with 👍 / 👎.
| env: | ||
| STAGE: ${{ steps.in.outputs.stage }} | ||
| BOXLITE_RUNNER_INSTANCE_ID: ${{ steps.in.outputs.instance_id }} | ||
| run: bash scripts/deploy/runner-update-binary.sh "${{ steps.in.outputs.ref }}" |
There was a problem hiding this comment.
Avoid passing a blank release version
When source=release is selected and the optional ref input is left blank, this quoted empty string is still passed as one positional argument, so runner-update-binary.sh sets VERSION='' instead of using its Cargo.toml fallback. The rollout then tries to fetch .../releases/download/v/boxlite-runner-v-linux-amd64.tar.gz and fails before updating; either require ref for release or omit the argument when it is blank.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (6)
.github/workflows/deploy.yml (1)
46-48: 🧹 Nitpick | 🔵 Trivial | 💤 Low valueEnvironment is hardcoded to
devregardless of stage input.Line 48 hardcodes
environment: dev, but thestageinput allows selecting different stages. If more stages are added later (e.g.,prod), the environment protection rules won't match.🔧 Proposed fix to use dynamic environment
deploy: runs-on: ubuntu-latest - environment: dev + environment: ${{ inputs.stage }} steps:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/deploy.yml around lines 46 - 48, The deploy job has the environment field hardcoded to dev instead of using the dynamic stage input. Replace the hardcoded dev value in the environment field with a reference to the inputs.stage variable to ensure the correct environment is selected based on the stage input parameter passed when triggering the workflow.apps/infra/scripts/sst-with-cloudflare.mjs (1)
93-97: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winSilent JSON parse failure may mask SSM response issues.
If the AWS CLI returns unexpected output (e.g., an error message instead of JSON), the function silently returns without any indication. Adding a warning would help diagnose CI failures.
🔧 Proposed fix to log parse errors
let params try { params = JSON.parse(json) - } catch { + } catch (e) { + console.warn(`sst-with-cloudflare: failed to parse SSM response for ${path}*: ${e.message}`) return }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/infra/scripts/sst-with-cloudflare.mjs` around lines 93 - 97, The catch block following the JSON.parse(json) statement silently returns without logging, which masks potential AWS CLI output issues. Add a warning log statement in the catch block that captures and displays both the parsing error and the problematic JSON input. This will help diagnose CI failures by making it clear when AWS returns unexpected output instead of valid JSON. Include both the error message and the actual response data that failed to parse.apps/infra/scripts/seed-deploy-env-ssm.mjs (1)
69-89: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winConsider non-zero exit on partial write failures.
The script logs failures but exits successfully even when some keys fail to write. In CI, this could mask issues where critical secrets weren't seeded properly.
🔧 Proposed fix to fail on partial errors
let ok = 0 +let failed = 0 for (const [key, val] of pairs) { const name = `/boxlite/${stage}/env/${key}` if (DRY) { console.log(` would put ${name}`) continue } try { execFileSync( 'aws', ['ssm', 'put-parameter', '--region', REGION, '--name', name, '--type', 'SecureString', '--overwrite', '--value', val], { stdio: ['ignore', 'ignore', 'pipe'] }, // never echo the value or the version output ) console.log(` ✓ ${key}`) ok++ } catch (err) { const msg = (err.stderr ? err.stderr.toString() : err.message).split('\n')[0] console.error(` ✗ ${key}: ${msg}`) + failed++ } } console.log(DRY ? 'seed-deploy-env-ssm: dry-run, nothing written' : `seed-deploy-env-ssm: wrote ${ok}/${pairs.length}`) +if (failed > 0) process.exit(1)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/infra/scripts/seed-deploy-env-ssm.mjs` around lines 69 - 89, The script logs failures when writing parameters to SSM but exits with status 0 regardless of whether all parameters were successfully written. After the loop over pairs completes, check if the ok counter equals pairs.length, and if they do not match (indicating one or more write failures), call process.exit(1) to signal failure to CI. This check should occur after the final console.log statement to ensure the summary message is printed before exiting with a non-zero status code..github/workflows/runner-rollout.yml (3)
137-140: 🧹 Nitpick | 🔵 Trivial | 💤 Low valueHardcoded Go version may drift from
config.outputs.go-version.Line 106 uses
${{ needs.config.outputs.go-version }}forsetup-go, but line 140 hardcodesgo 1.25.4in thego.workrewrite. If the project's Go version changes, this hardcoded value could become stale.Consider deriving from the config output:
♻️ Proposed fix
- name: Rewrite go.work for minimal modules if: ${{ steps.in.outputs.source == 'dev-build' }} + env: + GO_VERSION: ${{ needs.config.outputs.go-version }} run: | - printf 'go 1.25.4\n\nuse (\n\t./runner\n\t./daemon\n\t./common-go\n\t./api-client-go\n\t./libs/computer-use\n\t../sdks/go\n)\n' > apps/go.work + printf 'go %s\n\nuse (\n\t./runner\n\t./daemon\n\t./common-go\n\t./api-client-go\n\t./libs/computer-use\n\t../sdks/go\n)\n' "$GO_VERSION" > apps/go.work🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/runner-rollout.yml around lines 137 - 140, The "Rewrite go.work for minimal modules" step hardcodes the Go version as "go 1.25.4" in the go.work file, but the workflow uses a dynamic Go version from needs.config.outputs.go-version in the setup-go step. Replace the hardcoded version string "1.25.4" with a reference to the same config output variable used in the setup-go action to ensure the go.work file stays synchronized with the configured Go version whenever it changes in the future.
91-91: 🧹 Nitpick | 🔵 Trivial | ⚖️ Poor tradeoffConsider pinning actions to full SHA for supply chain security.
Actions at lines 91, 104, and 189 use version tags (
@v4,@v5) which could be force-pushed. Pinning to commit SHA provides immutability:- uses: actions/checkout@<full-sha> # v4This is a defense-in-depth measure; version tags are generally acceptable if the organization trusts the action maintainers.
Also applies to: 104-104, 189-189
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/runner-rollout.yml at line 91, The actions/checkout action at line 91 uses a version tag (`@v4`) instead of a full commit SHA, which could be vulnerable to force-push attacks. Replace the version tag references in the actions/checkout usage at line 91, the action at line 104 (`@v5`), and the action at line 189 with their corresponding full commit SHAs to ensure immutability and improve supply chain security. You can find the full SHA for each action version on the action's GitHub release page.Source: Linters/SAST tools
91-94: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winSet
persist-credentials: falseto avoid leaving Git credentials in the runner.While this workflow doesn't upload artifacts via
actions/upload-artifact, settingpersist-credentials: falseis a defense-in-depth practice that prevents the GITHUB_TOKEN from persisting in the.git/config.🔒 Proposed fix
- uses: actions/checkout@v4 with: ref: ${{ (steps.in.outputs.source == 'dev-build' && steps.in.outputs.ref != '') && steps.in.outputs.ref || github.ref }} + persist-credentials: false🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/runner-rollout.yml around lines 91 - 94, The actions/checkout@v4 step is missing the persist-credentials security configuration. Add persist-credentials: false to the with section of the actions/checkout@v4 step, placing it alongside the existing ref parameter to prevent the GITHUB_TOKEN from persisting in the .git/config file as a defense-in-depth security practice.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/deploy.yml:
- Around line 50-60: Pin the GitHub Actions to specific commit SHAs for supply
chain security. Update the aws-actions/configure-aws-credentials action to use
v4.3.1 with its corresponding SHA instead of the generic v4 reference.
Additionally, add persist-credentials: false to the actions/checkout action to
prevent credential leakage through artifacts when using OIDC authentication.
In @.github/workflows/runner-rollout.yml:
- Around line 76-84: The "Resolve inputs" step has a template injection
vulnerability where the freeform input values for ref and instance_id are
directly expanded into GITHUB_OUTPUT without sanitization, allowing newlines or
special characters to inject arbitrary output variables. Replace the direct echo
statements for these two inputs (the lines writing ref and instance_id to
GITHUB_OUTPUT) with environment variable assignments using heredoc delimiters to
safely handle potentially malicious or multi-line input values.
- Around line 219-224: The steps.in.outputs.ref value is being passed directly
to the bash script as a command-line argument in the run command, which creates
a command injection vulnerability if the ref contains shell metacharacters. Move
steps.in.outputs.ref into an environment variable by adding it to the env block
(similar to how STAGE and BOXLITE_RUNNER_INSTANCE_ID are defined), then modify
the run command to pass the value as an environment variable reference instead
of a direct shell argument, and ensure the runner-update-binary.sh script reads
it from the environment variable to prevent shell interpretation of any special
characters.
---
Nitpick comments:
In @.github/workflows/deploy.yml:
- Around line 46-48: The deploy job has the environment field hardcoded to dev
instead of using the dynamic stage input. Replace the hardcoded dev value in the
environment field with a reference to the inputs.stage variable to ensure the
correct environment is selected based on the stage input parameter passed when
triggering the workflow.
In @.github/workflows/runner-rollout.yml:
- Around line 137-140: The "Rewrite go.work for minimal modules" step hardcodes
the Go version as "go 1.25.4" in the go.work file, but the workflow uses a
dynamic Go version from needs.config.outputs.go-version in the setup-go step.
Replace the hardcoded version string "1.25.4" with a reference to the same
config output variable used in the setup-go action to ensure the go.work file
stays synchronized with the configured Go version whenever it changes in the
future.
- Line 91: The actions/checkout action at line 91 uses a version tag (`@v4`)
instead of a full commit SHA, which could be vulnerable to force-push attacks.
Replace the version tag references in the actions/checkout usage at line 91, the
action at line 104 (`@v5`), and the action at line 189 with their corresponding
full commit SHAs to ensure immutability and improve supply chain security. You
can find the full SHA for each action version on the action's GitHub release
page.
- Around line 91-94: The actions/checkout@v4 step is missing the
persist-credentials security configuration. Add persist-credentials: false to
the with section of the actions/checkout@v4 step, placing it alongside the
existing ref parameter to prevent the GITHUB_TOKEN from persisting in the
.git/config file as a defense-in-depth security practice.
In `@apps/infra/scripts/seed-deploy-env-ssm.mjs`:
- Around line 69-89: The script logs failures when writing parameters to SSM but
exits with status 0 regardless of whether all parameters were successfully
written. After the loop over pairs completes, check if the ok counter equals
pairs.length, and if they do not match (indicating one or more write failures),
call process.exit(1) to signal failure to CI. This check should occur after the
final console.log statement to ensure the summary message is printed before
exiting with a non-zero status code.
In `@apps/infra/scripts/sst-with-cloudflare.mjs`:
- Around line 93-97: The catch block following the JSON.parse(json) statement
silently returns without logging, which masks potential AWS CLI output issues.
Add a warning log statement in the catch block that captures and displays both
the parsing error and the problematic JSON input. This will help diagnose CI
failures by making it clear when AWS returns unexpected output instead of valid
JSON. Include both the error message and the actual response data that failed to
parse.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: fb1ddba8-dfbf-4b66-b65c-386069865c58
📒 Files selected for processing (6)
.github/workflows/deploy.yml.github/workflows/runner-rollout.ymlapps/infra/scripts/seed-deploy-env-ssm.mjsapps/infra/scripts/sst-with-cloudflare.mjsapps/infra/sst.config.tsscripts/deploy/runner-update-binary.sh
|
Addressed the SAST review findings:
Deferred to a separate follow-up (to keep this PR focused and avoid a half-pinned repo): SHA-pin the remaining ~34 |
| @@ -13,6 +13,8 @@ | |||
| # Usage: | |||
There was a problem hiding this comment.
remove and rewrite as apps/infra scripts
| - name: sst deploy | ||
| if: ${{ inputs.mode == 'deploy' }} | ||
| working-directory: apps/infra | ||
| run: npm run deploy -- --stage ${{ inputs.stage }} |
There was a problem hiding this comment.
how images built on github action?
There was a problem hiding this comment.
Using docker, which is defined in SST
| # ("undefined reference") — see 2026-06-20 validation. A true dev-build of | ||
| # unreleased source must build libboxlite.a from the Rust core here instead of | ||
| # downloading it. TODO: add the Rust libboxlite build for the dev-build path. | ||
| - name: Download prebuilt libboxlite.a |
There was a problem hiding this comment.
Not supported yet
| go -C apps/api-client-go mod download | ||
| go -C apps/libs/computer-use mod download | ||
|
|
||
| - name: Build daemon |
| -o runner/pkg/daemon/static/daemon-amd64 \ | ||
| ./daemon/cmd/daemon/ | ||
|
|
||
| - name: Build computer-use |
| -o runner/pkg/daemon/static/boxlite-computer-use \ | ||
| ./libs/computer-use/ | ||
|
|
||
| - name: Build runner |
| * node scripts/seed-deploy-env-ssm.mjs --dry-run # list keys, write nothing | ||
| */ | ||
|
|
||
| import { execFileSync } from 'node:child_process' |
There was a problem hiding this comment.
how to modify .env and review before trigger the deployment action?
There was a problem hiding this comment.
Just use AWS SSM Parameter Store, no review now
> /boxlite/dev/env/KEY = VALUE
| # | ||
| # ⚠️ Rolling the binary RESTARTS the runner's systemd unit — boxes running on that | ||
| # EC2 are interrupted. Box state under /var/lib/boxlite is preserved. | ||
| name: Runner Rollout |
There was a problem hiding this comment.
does upgrade runner affect online service?
There was a problem hiding this comment.
We don't have hot-reload on runner (Synced with @G4614 )
| @@ -0,0 +1,78 @@ | |||
| # One-click deploy button (MVP "lite+" — manual, reviewable, no static AWS keys). | |||
There was a problem hiding this comment.
first time deployment and subsequent update
There was a problem hiding this comment.
Both supported
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
apps/infra/scripts/rollout-runner-binary.mjs (1)
163-167: 🧹 Nitpick | 🔵 Trivial | 💤 Low valueSSM wait has no explicit timeout.
The
aws ssm wait command-executedrelies on AWS CLI defaults (typically ~600s with retries). If the remote script hangs, this will block for an extended period. For CI/CD visibility, consider adding an explicit--cli-read-timeoutor documenting the expected max duration.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/infra/scripts/rollout-runner-binary.mjs` around lines 163 - 167, The execFileSync call executing the aws ssm wait command-executed does not have an explicit timeout configured, which can cause the process to block indefinitely if the remote script hangs. Add the --cli-read-timeout parameter to the arguments array passed to the aws command within the execFileSync call to set an explicit timeout value that provides better visibility and prevents extended blocking in CI/CD environments.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/infra/scripts/rollout-runner-binary.mjs`:
- Around line 52-67: The resolveInstanceId() function does not validate that
exactly one instance matches the EC2 query filter. When multiple running
instances share the boxlite-runner-default tag, the --output text flag returns
space-separated instance IDs, which will cause issues when passed to SSM. After
the runAws() call returns the result, add validation to ensure exactly one
instance ID is returned; if zero or multiple instances are found, throw an
appropriate error with a clear message indicating the problem and requiring
manual intervention or tag corrections.
---
Nitpick comments:
In `@apps/infra/scripts/rollout-runner-binary.mjs`:
- Around line 163-167: The execFileSync call executing the aws ssm wait
command-executed does not have an explicit timeout configured, which can cause
the process to block indefinitely if the remote script hangs. Add the
--cli-read-timeout parameter to the arguments array passed to the aws command
within the execFileSync call to set an explicit timeout value that provides
better visibility and prevents extended blocking in CI/CD environments.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: a52792bd-70bc-4fbd-aed6-f31f95a26f1d
📒 Files selected for processing (11)
.github/workflows/deploy.yml.github/workflows/runner-rollout.ymlapps/infra/README.mdapps/infra/package.jsonapps/infra/scripts/rollout-runner-binary.mjsapps/infra/scripts/seed-deploy-env-ssm.mjsapps/infra/scripts/sst-with-cloudflare.mjsapps/infra/sst.config.tsmake/dist.mkmake/help.mkscripts/deploy/runner-update-binary.sh
💤 Files with no reviewable changes (1)
- scripts/deploy/runner-update-binary.sh
✅ Files skipped from review due to trivial changes (2)
- make/help.mk
- apps/infra/README.md
🚧 Files skipped from review as they are similar to previous changes (4)
- apps/infra/sst.config.ts
- apps/infra/scripts/seed-deploy-env-ssm.mjs
- apps/infra/scripts/sst-with-cloudflare.mjs
- .github/workflows/deploy.yml
| function resolveInstanceId() { | ||
| if (INSTANCE_ID) return INSTANCE_ID | ||
| return runAws([ | ||
| 'ec2', | ||
| 'describe-instances', | ||
| '--region', | ||
| AWS_REGION, | ||
| '--filters', | ||
| 'Name=tag:Name,Values=boxlite-runner-default', | ||
| 'Name=instance-state-name,Values=running', | ||
| '--query', | ||
| 'Reservations[].Instances[].InstanceId', | ||
| '--output', | ||
| 'text', | ||
| ]) | ||
| } |
There was a problem hiding this comment.
Multiple matching instances could cause unexpected behavior.
If multiple running EC2 instances share the boxlite-runner-default tag, --output text returns space-separated IDs. The script then passes this combined string to SSM, which may fail or behave unexpectedly.
Consider validating that exactly one instance matches or handling the multi-instance case explicitly.
🛡️ Suggested validation
function resolveInstanceId() {
if (INSTANCE_ID) return INSTANCE_ID
- return runAws([
+ const result = runAws([
'ec2',
'describe-instances',
'--region',
AWS_REGION,
'--filters',
'Name=tag:Name,Values=boxlite-runner-default',
'Name=instance-state-name,Values=running',
'--query',
'Reservations[].Instances[].InstanceId',
'--output',
'text',
])
+ const ids = result.split(/\s+/).filter(Boolean)
+ if (ids.length > 1) {
+ throw new Error(`multiple running boxlite-runner-default instances found: ${ids.join(', ')}; use BOXLITE_RUNNER_INSTANCE_ID to target one`)
+ }
+ return ids[0] || ''
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function resolveInstanceId() { | |
| if (INSTANCE_ID) return INSTANCE_ID | |
| return runAws([ | |
| 'ec2', | |
| 'describe-instances', | |
| '--region', | |
| AWS_REGION, | |
| '--filters', | |
| 'Name=tag:Name,Values=boxlite-runner-default', | |
| 'Name=instance-state-name,Values=running', | |
| '--query', | |
| 'Reservations[].Instances[].InstanceId', | |
| '--output', | |
| 'text', | |
| ]) | |
| } | |
| function resolveInstanceId() { | |
| if (INSTANCE_ID) return INSTANCE_ID | |
| const result = runAws([ | |
| 'ec2', | |
| 'describe-instances', | |
| '--region', | |
| AWS_REGION, | |
| '--filters', | |
| 'Name=tag:Name,Values=boxlite-runner-default', | |
| 'Name=instance-state-name,Values=running', | |
| '--query', | |
| 'Reservations[].Instances[].InstanceId', | |
| '--output', | |
| 'text', | |
| ]) | |
| const ids = result.split(/\s+/).filter(Boolean) | |
| if (ids.length > 1) { | |
| throw new Error(`multiple running boxlite-runner-default instances found: ${ids.join(', ')}; use BOXLITE_RUNNER_INSTANCE_ID to target one`) | |
| } | |
| return ids[0] || '' | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/infra/scripts/rollout-runner-binary.mjs` around lines 52 - 67, The
resolveInstanceId() function does not validate that exactly one instance matches
the EC2 query filter. When multiple running instances share the
boxlite-runner-default tag, the --output text flag returns space-separated
instance IDs, which will cause issues when passed to SSM. After the runAws()
call returns the result, add validation to ensure exactly one instance ID is
returned; if zero or multiple instances are found, throw an appropriate error
with a clear message indicating the problem and requiring manual intervention or
tag corrections.
GitHub Actions deploy buttons for dev, OIDC-only (no static keys): - deploy.yml: Cloud deploy (workflow_dispatch diff|deploy, environment: dev) — assumes boxlite-github-deployer-<stage>, runs sst diff/deploy with SSM-injected env. - runner-rollout.yml: swap the boxlite-runner binary in place via SSM (release or dev-build from S3) with checksum verify + auto-rollback. - runner-update-binary.sh: accept an S3 dev-channel source (separate sha256 URL). - seed-deploy-env-ssm.mjs / sst-with-cloudflare.mjs: inject stage env + Cloudflare creds from SSM so CI matches a laptop .env. - sst.config.ts: $transform stamps boxlite-role-boundary on every IAM role the app creates (satisfies the bounded deploy role's conditional create/manage grant); RunnerRole/Profile get explicit boxlite-dev-runner names so they fall under the deploy role's boxlite-* IAM scope. Validated 2026-06-22: deploy --stage dev completes end-to-end; dev healthy (8 services running, 5 ALBs active, api/dashboard 200, box create verified).
…sist-credentials)
- runner-rollout.yml: pass workflow_dispatch inputs (ref/instance_id, free-text) via
env vars instead of ${{ }} interpolated into run: scripts, so a crafted input is a
shell value rather than executable script (CodeRabbit/zizmor template-injection).
- deploy.yml + runner-rollout.yml: persist-credentials: false on checkout — these
workflows auth to AWS via OIDC and never push to the repo, so the GITHUB_TOKEN
doesn't need to linger in .git/config.
Left for a repo-wide follow-up (not this PR, for consistency): SHA-pin actions across
all privileged workflows + tighten the deployer OIDC trust from repo:*:* to
:environment:dev.
…ew M1/M2) - Pin all third-party actions in the privileged deploy workflows to commit SHAs (actions/checkout, setup-node, setup-go, aws-actions/configure-aws-credentials) with version comments — these workflows hold OIDC→AWS access, so an action tag hijack is a supply-chain path. (M1) - runner-rollout.yml Resolve inputs: write the free-text ref/instance_id to GITHUB_OUTPUT via heredoc delimiters so a newline-laden value can't inject extra output keys (e.g. forge a stage=). (M2; M3 command-injection was fixed in fff6b76) Follow-up (separate, not this PR): SHA-pin the remaining ~34 @v4/@v5 uses repo-wide, and tighten the deployer OIDC trust from repo:*:* to :environment:dev.
5ff8105 to
3644ed5
Compare
| mode: | ||
| description: "upgrade, or rollback (install an older release — requires source=release)" | ||
| type: choice | ||
| options: [upgrade, rollback] |
There was a problem hiding this comment.
No diff between upgrade and rollback, but you can only use rollback when source is release @DorianZheng
What
One-click GitHub Actions deploy buttons for the dev stage — OIDC-only, no static AWS keys. The "lite MVP" of the release flow: change a thing → click its button → watch the green check.
deploy.ymlworkflow_dispatch(stage + diff|deploy,environment: dev) → assumeboxlite-github-deployer-<stage>→sst diff/deploywith SSM-injected envrunner-rollout.ymlboxlite-runnerbinary in place via SSM (release or dev-build from S3) → checksum verify → restart → auto-rollbackSupporting changes:
runner-update-binary.sh— accept an S3 dev-channel source (separate sha256 presigned URL).seed-deploy-env-ssm.mjs/sst-with-cloudflare.mjs— inject the stage's env + Cloudflare creds from SSM so CI matches a laptop.env.sst.config.ts:$transformstampsboxlite-role-boundaryon every IAM role the app creates, satisfying the bounded deploy role's conditional create/manage grant.RunnerRole/RunnerProfileget explicitboxlite-dev-runnernames so they fall under the deploy role'sboxlite-*IAM scope.IAM (one-time, account admin)
boxlite-github-deployer-<stage>carriesboxlite-bounded-role-admin(CreateRole / PutRolePermissionsBoundary / PassRole onboxlite-*, conditioned oniam:PermissionsBoundary == boxlite-role-boundary) +IAMReadOnlyAccess. The$transformabove is what satisfies that condition.Validated (2026-06-22)
deploy --stage devran end-to-end via the GitHub Action and completed:api.dev/proxy.dev/ssh.dev/dev.boxlite.aiall resolve;api/health+ dashboard return 200boxlite create→ control-planeRunning+ runner krun VM verified via SSMsst diffis clean (no destructive churn) → repeatableRunner Rollout was separately validated end-to-end (build → S3 → SSM → active → box).
Not in this PR
N4 (DB migrations off-boot), Release-All orchestration, prod stage — tracked separately.
Summary by CodeRabbit