Skip to content

tooling: fast-deploy.sh health check should use the actual bind address, not localhost #619

@satyakwok

Description

@satyakwok

The Phase 3d per-host health check in scripts/fast-deploy.sh runs:

ssh $userhost "curl -sf http://localhost:$port/health"

This works for validators that bind 0.0.0.0, but fails for any node that binds to a specific interface (e.g. 10.20.0.6:8557 for fullnodes on the operator's current topology). The script exits with code 6 ("HEALTH CHECK FAILED") on every mainnet deploy even though the chain is healthy on the new binary.

Why this is a bug, not config

fast-deploy is supposed to be portable across topologies. Hardcoding localhost assumes a binding it can't enforce.

Suggested fix

Either:

  1. Parse the bind-address from fleet.env (add <ROLE>_RPC_HOST field with default localhost)
  2. Or fall back from localhost to the role's wg/public IP on connect-refused

Acceptance

  • Phase 3d passes on every host without false-positive
  • Existing topology still works without fleet.env changes (sensible defaults)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions