Skip to content

fix: HealthChecker.Register immediately populates cache when already running#21979

Open
Fletch153 wants to merge 2 commits intodevelopfrom
fix-healthchecker-stale-cache
Open

fix: HealthChecker.Register immediately populates cache when already running#21979
Fletch153 wants to merge 2 commits intodevelopfrom
fix-healthchecker-stale-cache

Conversation

@Fletch153
Copy link
Copy Markdown
Collaborator

Summary

  • Root cause (CORE-2386): spawner.go starts a job service and then registers it with HealthChecker. Because Ready() reads state live but IsHealthy() returns a cached snapshot refreshed every 15 s, the new service appears absent from health reports for up to 15 seconds after it is already ready.
  • Fix: HealthChecker.Register() in chainlink-common now immediately populates c.ready / c.healthy for the newly registered service when the checker is already running, so the stale-cache window is eliminated.
  • The servicesMu write lock is scoped to an inner closure and released before calling IfStarted, avoiding a potential deadlock with Start() (which holds the StateMachine lock while calling update(), which acquires servicesMu.RLock).

Changes

chainlink-common (companion branch: fix/health-checker-register-immediate-update)

  • pkg/services/health.go — modified Register() to trigger an immediate single-service health check via IfStarted
  • pkg/services/health_test.go — two new tests: healthy and unhealthy service registered on a running checker are immediately visible without waiting for a tick

chainlink (this PR)

  • go.mod / go.sum — bump chainlink-common to v0.11.2-0.20260410211832-c51ec3e59945

Test plan

  • go test github.com/smartcontractkit/chainlink-common/pkg/services/... — two new tests pass: TestHealthChecker_Register_ImmediateUpdate, TestHealthChecker_Register_ImmediateUpdate_Unhealthy
  • go build ./core/services/... passes
  • Merge chainlink-common companion PR before merging this one

🤖 Generated with Claude Code

@Fletch153 Fletch153 requested review from a team as code owners April 10, 2026 21:23
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 10, 2026

✅ No conflicts with other open PRs targeting develop

@Fletch153 Fletch153 force-pushed the fix-healthchecker-stale-cache branch from fddfedc to 79ad808 Compare April 10, 2026 21:44
@trunk-io
Copy link
Copy Markdown

trunk-io bot commented Apr 10, 2026

Static BadgeStatic BadgeStatic BadgeStatic Badge

Failed Test Failure Summary Logs
Test_CCIPProgrammableTokenTransfer_EVM2Sui_BurnMintTokenPool The test failed without a specific error message, likely due to an issue during execution or setup. Logs ↗︎

View Full Report ↗︎Docs

@Fletch153 Fletch153 force-pushed the fix-healthchecker-stale-cache branch from 79ad808 to 2de53c9 Compare April 10, 2026 21:53
HealthChecker.Register() now signals the polling goroutine to run
update() immediately when the checker is already running, eliminating
the up-to-15-second window where a dynamically registered service
(e.g. a spawned job) was absent from IsHealthy() / IsReady() even
though Ready() already returned nil.

Fixes CORE-2386.

Companion chainlink-common PR: smartcontractkit/chainlink-common#1976
@Fletch153 Fletch153 force-pushed the fix-healthchecker-stale-cache branch from 2de53c9 to fc680d0 Compare April 10, 2026 22:27
@Fletch153 Fletch153 requested a review from a team as a code owner April 10, 2026 22:38
@github-actions
Copy link
Copy Markdown
Contributor

I see you updated files related to core. Please run make gocs in the root directory to add a changeset as well as in the text include at least one of the following tags:

  • #added For any new functionality added.
  • #breaking_change For any functionality that requires manual action for the node to boot.
  • #bugfix For bug fixes.
  • #changed For any change to the existing functionality.
  • #db_update For any feature that introduces updates to database schema.
  • #deprecation_notice For any upcoming deprecation functionality.
  • #internal For changesets that need to be excluded from the final changelog.
  • #nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
  • #removed For any functionality/config that is removed.
  • #updated For any functionality that is updated.
  • #wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

@cl-sonarqube-production
Copy link
Copy Markdown

Quality Gate passed Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

Updates chainlink-common to v0.11.2-0.20260410222629-07a72b48fc48 in
core/scripts, deployment, integration-tests, integration-tests/load,
system-tests/lib, and system-tests/tests sub-modules.
@Fletch153 Fletch153 force-pushed the fix-healthchecker-stale-cache branch from 59cac3d to 7baf276 Compare April 13, 2026 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant