The Phase 3d per-host health check in scripts/fast-deploy.sh runs:
ssh $userhost "curl -sf http://localhost:$port/health"
This works for validators that bind 0.0.0.0, but fails for any node that binds to a specific interface (e.g. 10.20.0.6:8557 for fullnodes on the operator's current topology). The script exits with code 6 ("HEALTH CHECK FAILED") on every mainnet deploy even though the chain is healthy on the new binary.
Why this is a bug, not config
fast-deploy is supposed to be portable across topologies. Hardcoding localhost assumes a binding it can't enforce.
Suggested fix
Either:
- Parse the bind-address from
fleet.env (add <ROLE>_RPC_HOST field with default localhost)
- Or fall back from
localhost to the role's wg/public IP on connect-refused
Acceptance
- Phase 3d passes on every host without false-positive
- Existing topology still works without fleet.env changes (sensible defaults)
The Phase 3d per-host health check in
scripts/fast-deploy.shruns:This works for validators that bind
0.0.0.0, but fails for any node that binds to a specific interface (e.g.10.20.0.6:8557for fullnodes on the operator's current topology). The script exits with code 6 ("HEALTH CHECK FAILED") on every mainnet deploy even though the chain is healthy on the new binary.Why this is a bug, not config
fast-deploy is supposed to be portable across topologies. Hardcoding
localhostassumes a binding it can't enforce.Suggested fix
Either:
fleet.env(add<ROLE>_RPC_HOSTfield with defaultlocalhost)localhostto the role's wg/public IP on connect-refusedAcceptance