Skip to content

Conversation

@mfw78
Copy link
Contributor

@mfw78 mfw78 commented Jan 7, 2026

Summary

  • Adds wait_for_caddy_ready() function that polls Caddy admin API every 500ms
  • Waits up to 60s for Caddy to become available before attempting route restoration
  • Fixes race condition where worker starts before Caddy is ready, causing all route restoration to fail

Problem

When both services start simultaneously (e.g., system boot), the worker would immediately try to restore routes but Caddy's admin API wasn't ready yet, resulting in:

ERROR catapult::worker::deploy::sites: Failed to restore Caddy route site_id=nxm-rs-website-pr-12 error=Failed to get Caddy routes

Test plan

  • Deploy to worker node and reboot
  • Verify logs show "Waiting for Caddy to be ready" followed by "Caddy admin API is ready"
  • Verify all routes are restored successfully

mfw78 added 2 commits January 7, 2026 13:25
The worker was attempting to restore Caddy routes immediately on startup,
before Caddy's admin API was available. This caused all route restoration
to fail when both services started simultaneously.

Added wait_for_caddy_ready() that polls the Caddy admin API every 500ms
with a 60s timeout before attempting route restoration.
@mfw78 mfw78 merged commit d221d84 into main Jan 7, 2026
6 checks passed
@mfw78 mfw78 deleted the fix/caddy-startup-race branch January 7, 2026 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants