Skip to content

Fix bench-api deploy on single-node Swarm (stop-first rollout)#84

Merged
Navneethd8 merged 2 commits into
mainfrom
fix/bench-api-stop-first-deploy
Jun 2, 2026
Merged

Fix bench-api deploy on single-node Swarm (stop-first rollout)#84
Navneethd8 merged 2 commits into
mainfrom
fix/bench-api-stop-first-deploy

Conversation

@Navneethd8

Copy link
Copy Markdown
Contributor

Summary

  • Use stop-first rolling updates for bench-api and bench-worker on single-node Swarm (EC2 prod) instead of start-first, which requires 2× memory reservations and triggers insufficient resources
  • Reconcile extra running tasks before deploy (cleans up failed start-first rollouts that left multiple bench-api containers)
  • Tighten rollout completion check: require exact replica count, not >=

Test plan

  • On prod EC2: ./s/ops/deploy.sh bench-api succeeds after merge
  • docker service ps bench-api shows exactly 1 running task after deploy
  • docker ps lists only one bench-api.* container

Immediate prod recovery (before merge)

docker service update --force --update-order stop-first bench-api
docker service ps bench-api

Made with Cursor

Navneethd8 and others added 2 commits June 1, 2026 16:12
Start-first updates need 2× memory reservations and fail on the EC2 host; reconcile stray duplicate tasks before deploy and tighten rollout completion checks.

Co-authored-by: Cursor <cursoragent@cursor.com>
Split bench-api from server in get_resource_limits and align local compose.

Co-authored-by: Cursor <cursoragent@cursor.com>
@Navneethd8 Navneethd8 merged commit daef609 into main Jun 2, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants