[DNM] (cluster-bot) Fix Healthy condition not updating from Ironic health data#471
[DNM] (cluster-bot) Fix Healthy condition not updating from Ironic health data#471jacob-anders wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
/hold |
WalkthroughConditions are snapshotted, recomputed, and deep-compared; host status is saved when the state-machine is dirty or conditions changed. HealthyCondition is left unchanged when a provisioner returns empty health. Ironic suppresses logging for registration-needed node errors. Microversion selection prefers 1.109 when supported. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jacob-anders The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal/controller/metal3.io/baremetalhost_controller_test.go (1)
3230-3238: Seed an existingHealthyConditionin this regression test.This currently proves only that empty health does not create a condition on a fresh host. The bug in the PR was overwriting an existing
Healthy=True, so the test should start with that condition and assert it survivescomputeConditions().Targeted test adjustment
t.Run("empty health leaves condition unchanged", func(t *testing.T) { - host := bmhWithStatus(metal3api.OperationalStatusOK, metal3api.StateAvailable) + host := bmhWithStatus(metal3api.OperationalStatusOK, metal3api.StateProvisioned) + conditions.Set(host, metav1.Condition{ + Type: metal3api.HealthyCondition, + Status: metav1.ConditionTrue, + Reason: metal3api.HealthyReason, + }) fix := &fixture.Fixture{Health: ""} prov, err := fix.NewProvisioner(t.Context(), provisioner.BuildHostData(*host, bmc.Credentials{}), nil) require.NoError(t, err) computeConditions(t.Context(), host, prov) cond := conditions.Get(host, metal3api.HealthyCondition) - assert.Nil(t, cond, "empty health should not set the Healthy condition") + require.NotNil(t, cond) + assert.Equal(t, metav1.ConditionTrue, cond.Status) + assert.Equal(t, metal3api.HealthyReason, cond.Reason) })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/metal3.io/baremetalhost_controller_test.go` around lines 3230 - 3238, The test "empty health leaves condition unchanged" currently verifies empty health doesn't create a condition; modify it to seed an existing Healthy condition on the host before calling computeConditions() so we verify it isn't overwritten: set metal3api.HealthyCondition = True on the host object (use conditions.Set or the same helper used elsewhere in tests) prior to calling computeConditions(host, prov), then after computeConditions() use conditions.Get(host, metal3api.HealthyCondition) and assert the condition still exists and remains True (and retains its status/message if applicable).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/controller/metal3.io/baremetalhost_controller.go`:
- Around line 243-250: computeConditions is being called every reconcile and for
the ironic provisioner results in two extra Ironic node reads because
HasPowerFailure() and GetHealth() each call getNode(ctx); fix by obtaining the
node-backed status once per reconcile and passing that snapshot into
computeConditions (or by adding a cached accessor on the ironic provisioner that
does a single getNode(ctx) and reuses the result). Concretely, call getNode(ctx)
once before computeConditions, create a node/status snapshot, and change
computeConditions(host, prov) to accept that snapshot (or add
prov.GetNodeSnapshot(ctx) and have HasPowerFailure/GetHealth use it), ensuring
computeConditions uses the single cached snapshot instead of triggering multiple
getNode calls.
---
Nitpick comments:
In `@internal/controller/metal3.io/baremetalhost_controller_test.go`:
- Around line 3230-3238: The test "empty health leaves condition unchanged"
currently verifies empty health doesn't create a condition; modify it to seed an
existing Healthy condition on the host before calling computeConditions() so we
verify it isn't overwritten: set metal3api.HealthyCondition = True on the host
object (use conditions.Set or the same helper used elsewhere in tests) prior to
calling computeConditions(host, prov), then after computeConditions() use
conditions.Get(host, metal3api.HealthyCondition) and assert the condition still
exists and remains True (and retains its status/message if applicable).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 269eaed4-34cc-4e2f-a2fc-392cc86d30ed
📒 Files selected for processing (3)
internal/controller/metal3.io/baremetalhost_controller.gointernal/controller/metal3.io/baremetalhost_controller_test.gopkg/provisioner/ironic/ironic.go
| // Always compute conditions since some (e.g. Healthy) can change | ||
| // based on external state from Ironic regardless of state machine | ||
| // transitions. | ||
| existing := host.GetConditions() | ||
| conditionsBefore := make([]metav1.Condition, len(existing)) | ||
| copy(conditionsBefore, existing) | ||
| computeConditions(ctx, host, prov) | ||
| conditionsChanged := !reflect.DeepEqual(conditionsBefore, host.GetConditions()) |
There was a problem hiding this comment.
This turns condition recomputation into two extra Ironic reads per loop.
computeConditions() now runs on every reconcile, and for the ironic provisioner it calls both HasPowerFailure() and GetHealth(). In pkg/provisioner/ironic/ironic.go, both methods do their own getNode(ctx) lookup, so steady-state hosts pick up two additional node reads every cycle. Please cache the node-backed condition data per reconcile or plumb a single status snapshot into computeConditions. As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/controller/metal3.io/baremetalhost_controller.go` around lines 243 -
250, computeConditions is being called every reconcile and for the ironic
provisioner results in two extra Ironic node reads because HasPowerFailure() and
GetHealth() each call getNode(ctx); fix by obtaining the node-backed status once
per reconcile and passing that snapshot into computeConditions (or by adding a
cached accessor on the ironic provisioner that does a single getNode(ctx) and
reuses the result). Concretely, call getNode(ctx) once before computeConditions,
create a node/status snapshot, and change computeConditions(host, prov) to
accept that snapshot (or add prov.GetNodeSnapshot(ctx) and have
HasPowerFailure/GetHealth use it), ensuring computeConditions uses the single
cached snapshot instead of triggering multiple getNode calls.
9488e3f to
79ce597
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
internal/controller/metal3.io/baremetalhost_controller.go (1)
243-250:⚠️ Potential issue | 🟠 MajorThis now polls Ironic on every clean reconcile.
Moving
computeConditions()out of the dirty-only path makes Line 249 part of every idle loop. Line 2314 then callsprov.GetHealth(ctx)whenever a provisioner exists, and the Ironic implementation fetches that throughgetNode(ctx), so steady-state hosts pick up an extra Ironic read every reconcile. Please cache the node-backed condition data for the reconcile, or expose a single provisioner call that serves the condition checks from one fetch. As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/metal3.io/baremetalhost_controller.go` around lines 243 - 250, computeConditions() is being called on every reconcile causing prov.GetHealth(ctx) -> getNode(ctx) to hit Ironic on steady-state hosts; fix by ensuring node-backed condition data is fetched once per reconcile and reused: either obtain and cache the provisioner node once (call prov.getNode or add a new prov.GetNodeOnce/GetHealthCached) at the start of Reconcile and pass that node into computeConditions(host, provNode) (or add an overload computeConditions(ctx, host, provNode)), or move computeConditions back into the dirty-only path and have prov.GetHealth invoked only when the host is dirty; update callers (computeConditions, prov.GetHealth usage and any getNode callers) to use the cached node/provider method so a single Ironic read services all condition checks during one reconcile.
🧹 Nitpick comments (1)
internal/controller/metal3.io/baremetalhost_controller_test.go (1)
3230-3239: This misses the actual empty-health regression.The bug here was overwriting an existing
HealthyCondition; starting from a host with no condition only proves we do not create a new one. SeedHealthyCondition=Truefirst and assert it survives unchanged.Coverage tweak
t.Run("empty health leaves condition unchanged", func(t *testing.T) { host := bmhWithStatus(metal3api.OperationalStatusOK, metal3api.StateAvailable) + conditions.Set(host, metav1.Condition{ + Type: metal3api.HealthyCondition, + Status: metav1.ConditionTrue, + Reason: metal3api.HealthyReason, + }) fix := &fixture.Fixture{Health: ""} prov, err := fix.NewProvisioner(t.Context(), provisioner.BuildHostData(*host, bmc.Credentials{}), nil) require.NoError(t, err) computeConditions(t.Context(), host, prov) cond := conditions.Get(host, metal3api.HealthyCondition) - assert.Nil(t, cond, "empty health should not set the Healthy condition") + require.NotNil(t, cond) + assert.Equal(t, metav1.ConditionTrue, cond.Status) + assert.Equal(t, metal3api.HealthyReason, cond.Reason) })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/metal3.io/baremetalhost_controller_test.go` around lines 3230 - 3239, The test "empty health leaves condition unchanged" currently verifies that an absent HealthyCondition isn't created but misses the regression where an existing HealthyCondition could be overwritten; modify the test to seed the host with HealthyCondition=True before invoking computeConditions (use conditions.Set or the existing helper that sets conditions on the host) on the host returned by bmhWithStatus, then call provisioner.NewProvisioner and computeConditions as before, and finally assert that conditions.Get(host, metal3api.HealthyCondition) still indicates True (unchanged) rather than being cleared or modified.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@internal/controller/metal3.io/baremetalhost_controller.go`:
- Around line 243-250: computeConditions() is being called on every reconcile
causing prov.GetHealth(ctx) -> getNode(ctx) to hit Ironic on steady-state hosts;
fix by ensuring node-backed condition data is fetched once per reconcile and
reused: either obtain and cache the provisioner node once (call prov.getNode or
add a new prov.GetNodeOnce/GetHealthCached) at the start of Reconcile and pass
that node into computeConditions(host, provNode) (or add an overload
computeConditions(ctx, host, provNode)), or move computeConditions back into the
dirty-only path and have prov.GetHealth invoked only when the host is dirty;
update callers (computeConditions, prov.GetHealth usage and any getNode callers)
to use the cached node/provider method so a single Ironic read services all
condition checks during one reconcile.
---
Nitpick comments:
In `@internal/controller/metal3.io/baremetalhost_controller_test.go`:
- Around line 3230-3239: The test "empty health leaves condition unchanged"
currently verifies that an absent HealthyCondition isn't created but misses the
regression where an existing HealthyCondition could be overwritten; modify the
test to seed the host with HealthyCondition=True before invoking
computeConditions (use conditions.Set or the existing helper that sets
conditions on the host) on the host returned by bmhWithStatus, then call
provisioner.NewProvisioner and computeConditions as before, and finally assert
that conditions.Get(host, metal3api.HealthyCondition) still indicates True
(unchanged) rather than being cleared or modified.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 10316ad6-9c82-42fd-962d-3c737657c46b
📒 Files selected for processing (5)
internal/controller/metal3.io/baremetalhost_controller.gointernal/controller/metal3.io/baremetalhost_controller_test.gopkg/provisioner/ironic/clients/features.gopkg/provisioner/ironic/clients/features_test.gopkg/provisioner/ironic/ironic.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/provisioner/ironic/ironic.go
79ce597 to
91f8548
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/controller/metal3.io/baremetalhost_controller_test.go (1)
3230-3239: Add the regression case that preserves an existing Healthy condition.This subtest only proves absent→absent. The production bug was a previously set Healthy condition being overwritten during steady-state reconciles, so it’s worth seeding
HealthyConditionfirst and asserting it survives empty health.Suggested regression case
+ t.Run("empty health preserves existing Healthy condition", func(t *testing.T) { + host := bmhWithStatus(metal3api.OperationalStatusOK, metal3api.StateProvisioned) + conditions.Set(host, metav1.Condition{ + Type: metal3api.HealthyCondition, + Status: metav1.ConditionTrue, + Reason: metal3api.HealthyReason, + }) + fix := &fixture.Fixture{Health: ""} + prov, err := fix.NewProvisioner(t.Context(), provisioner.BuildHostData(*host, bmc.Credentials{}), nil) + require.NoError(t, err) + computeConditions(t.Context(), host, prov) + + cond := conditions.Get(host, metal3api.HealthyCondition) + require.NotNil(t, cond) + assert.Equal(t, metav1.ConditionTrue, cond.Status) + assert.Equal(t, metal3api.HealthyReason, cond.Reason) + })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/metal3.io/baremetalhost_controller_test.go` around lines 3230 - 3239, Add a regression subtest that seeds an existing Healthy condition on the host before calling computeConditions so empty health does not overwrite it: in the "empty health leaves condition unchanged" subtest (using bmhWithStatus and fix.NewProvisioner / computeConditions) set the host's metal3api.HealthyCondition to a known value (e.g., True with a reason/message) prior to computeConditions and then assert that the condition still exists and its status/reason/message are unchanged after computeConditions returns.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal/controller/metal3.io/baremetalhost_controller_test.go`:
- Around line 3230-3239: Add a regression subtest that seeds an existing Healthy
condition on the host before calling computeConditions so empty health does not
overwrite it: in the "empty health leaves condition unchanged" subtest (using
bmhWithStatus and fix.NewProvisioner / computeConditions) set the host's
metal3api.HealthyCondition to a known value (e.g., True with a reason/message)
prior to computeConditions and then assert that the condition still exists and
its status/reason/message are unchanged after computeConditions returns.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8d688a2f-e4bd-41a8-a026-eed8c4a8d8f3
📒 Files selected for processing (5)
internal/controller/metal3.io/baremetalhost_controller.gointernal/controller/metal3.io/baremetalhost_controller_test.gopkg/provisioner/ironic/clients/features.gopkg/provisioner/ironic/clients/features_test.gopkg/provisioner/ironic/ironic.go
✅ Files skipped from review due to trivial changes (1)
- pkg/provisioner/ironic/clients/features_test.go
Four bugs prevented the Healthy condition from working correctly on provisioned hosts: 1. computeConditions was only called inside the actResult.Dirty() block. For steady-state provisioned hosts where nothing changes during reconciliation, conditions were never recomputed. This meant the Healthy condition (populated from Ironic's health monitoring API) stayed at its initial Unknown value forever, since health status changes externally in Ironic and doesn't trigger state machine transitions. 2. ChooseMicroversion() capped at 1.95, but the Ironic health field on the Node resource requires microversion 1.109. With a lower microversion, Ironic does not include the health field in responses, so GetHealth always returned an empty string and the Healthy condition was never set. Added HasHealthAPI() feature detection and bumped ChooseMicroversion() to return 1.109 when the Ironic endpoint supports it. 3. Empty health from GetHealth (e.g. when the node is not yet registered) was treated as "unknown" via the default switch case, overwriting a previously valid Healthy=True on every idle reconcile. Now empty health means "no data, leave the existing condition unchanged" — only non-empty values from Ironic update the condition. 4. conditionsBefore used a slice reference sharing the same backing array as host.Status.Conditions. In-place modifications by conditions.Set were visible through both references, making reflect.DeepEqual always return true. Deep copy the slice before comparing so condition-only changes trigger a status save. Also suppress ErrNeedsRegistration log noise in GetHealth, which fires on every reconcile before the node is registered in Ironic. Assisted-By: Claude Opus 4.6 Signed-off-by: Jacob Anders <jacob-anders-dev@proton.me>
91f8548 to
2954a71
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal/controller/metal3.io/baremetalhost_controller_test.go (1)
3230-3238: Strengthen this regression test to hit the real failure mode.This starts from a host with no
HealthyCondition, so it only proves empty health doesn't create one. The regression described in the PR was preserving an already-set condition on a steady-state provisioned host. SeedHealthyConditionfirst, and preferably drive it throughReconcile()once so the condition-only save path is covered too.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/metal3.io/baremetalhost_controller_test.go` around lines 3230 - 3238, The test currently verifies empty health doesn't create a Healthy condition but misses the regression where an existing Healthy condition should be preserved; modify the test "empty health leaves condition unchanged" to seed a Healthy condition on the host before running the provisioner (use conditions.Set/conditions.MarkTrue on metal3api.HealthyCondition or directly set host.Status.Conditions) and then drive the host through the controller Reconcile path once (call the test helper that invokes Reconcile or controller.Reconcile) before invoking computeConditions/ provisioner to ensure the condition-only save path is exercised and that the pre-seeded Healthy condition remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/provisioner/ironic/clients/features.go`:
- Around line 65-67: ChooseMicroversion() currently returns "1.109" when
af.HasHealthAPI() is true, which causes p.client.Microversion (set from
availableFeatures.ChooseMicroversion()) to be bumped globally and affect all
node operations; change the behavior so ChooseMicroversion() returns the
conservative baseline "1.89" for the shared client and only use "1.109" for
health reads by creating a dedicated health client or temporarily cloning the
client and setting its Microversion in the health-read path (e.g., around the
health read call/site), ensuring other methods like Create, Get, Update,
ChangeProvisionState, Delete, SetRAIDConfig, ListBIOSSettings continue using the
baseline microversion.
---
Nitpick comments:
In `@internal/controller/metal3.io/baremetalhost_controller_test.go`:
- Around line 3230-3238: The test currently verifies empty health doesn't create
a Healthy condition but misses the regression where an existing Healthy
condition should be preserved; modify the test "empty health leaves condition
unchanged" to seed a Healthy condition on the host before running the
provisioner (use conditions.Set/conditions.MarkTrue on
metal3api.HealthyCondition or directly set host.Status.Conditions) and then
drive the host through the controller Reconcile path once (call the test helper
that invokes Reconcile or controller.Reconcile) before invoking
computeConditions/ provisioner to ensure the condition-only save path is
exercised and that the pre-seeded Healthy condition remains unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 95d4c08b-35c5-4f91-897f-ce9d7ddd5178
📒 Files selected for processing (5)
internal/controller/metal3.io/baremetalhost_controller.gointernal/controller/metal3.io/baremetalhost_controller_test.gopkg/provisioner/ironic/clients/features.gopkg/provisioner/ironic/clients/features_test.gopkg/provisioner/ironic/ironic.go
✅ Files skipped from review due to trivial changes (1)
- pkg/provisioner/ironic/ironic.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/provisioner/ironic/clients/features_test.go
| if af.HasHealthAPI() { | ||
| return "1.109" | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Confirm global microversion assignment from ChooseMicroversion()
rg -n -C3 'p\.client\.Microversion\s*=\s*p\.availableFeatures\.ChooseMicroversion\(\)' pkg/provisioner/ironic
# 2) List Ironic node operations using the same shared client
rg -n -C2 'nodes\.[A-Za-z_]+\(' pkg/provisioner/ironic
# 3) Check for per-request microversion/header overrides (if any)
rg -n -C2 'X-OpenStack-Ironic-API-Version|OpenStack-API-Version|Microversion' pkg/provisioner/ironicRepository: openshift/baremetal-operator
Length of output: 16585
🏁 Script executed:
# Find HasHealthAPI() definition and usage
rg -n -B2 -A5 'func.*HasHealthAPI' pkg/provisioner/ironic
# Find health-related operations or API calls
rg -n -i 'health' pkg/provisioner/ironic/clients/features.go
# Check what microversions are involved and their implications
rg -n '1\.10[0-9]' pkg/provisioner/ironicRepository: openshift/baremetal-operator
Length of output: 1123
🏁 Script executed:
# Search for actual health endpoint/operation calls
rg -n -i 'health' pkg/provisioner/ironic/*.go
# Check if there's any health check method or similar
rg -n 'func.*health|checkHealth|getHealth' pkg/provisioner/ironic -i
# Look for what operations actually require 1.109
rg -n 'GetHealthStatus|health.*status' pkg/provisioner/ironic -iRepository: openshift/baremetal-operator
Length of output: 2237
🏁 Script executed:
# Get the full implementation of GetHealth()
sed -n '2001,2020p' pkg/provisioner/ironic/ironic.go
# Check what getNode() does (referenced in GetHealth)
rg -n -A10 'func.*getNode\(' pkg/provisioner/ironic/ironic.go | head -30
# Verify if getNode uses the shared client
rg -n 'getNode.*p\.client' pkg/provisioner/ironic/ironic.goRepository: openshift/baremetal-operator
Length of output: 945
🏁 Script executed:
# Confirm p.client is shared across all methods in the provisioner
rg -n 'p\.client\.' pkg/provisioner/ironic/ironic.go | wc -l
# Check if nodes.Get or any node operation supports per-request microversion override
rg -n 'type.*Opts' pkg/provisioner/ironic --include='*.go' | grep -i 'microversion\|version'
# Look at the client definition to understand if Microversion affects all requests
rg -n 'type.*ServiceClient' pkg/provisioner/ironic -A5Repository: openshift/baremetal-operator
Length of output: 532
🏁 Script executed:
# Check the p.client field definition
rg -n 'p\.client' pkg/provisioner/ironic/ironic.go -B2 -A2 | head -20
# Look at the provisioner struct definition
rg -n 'type ironicProvisioner struct' pkg/provisioner/ironic/ironic.go -A10
# Check if gophercloud ServiceClient uses global Microversion
rg -n 'ServiceClient' pkg/provisioner/ironic/clients/client.go -B2 -A5Repository: openshift/baremetal-operator
Length of output: 1640
🏁 Script executed:
# Check if there are comments or documentation about microversion 1.109 compatibility
rg -n '1\.109\|microversion.*compat' pkg/provisioner/ironic -B2 -A2
# Look for any test coverage showing issues with 1.109 on non-health operations
rg -n '1\.109' pkg/provisioner/ironic -l
# Check the baseline/conservative microversion to understand the risk delta
rg -n 'baselineVersionString\|conservative' pkg/provisioner/ironic/clients/client.go -B2 -A2Repository: openshift/baremetal-operator
Length of output: 162
🏁 Script executed:
# Find baseline/conservative version definition
rg -n 'baselineVersionString\|1\.6\|1\.5' pkg/provisioner/ironic/clients/client.go -B2 -A2
# Get context around client initialization with versions
sed -n '85,105p' pkg/provisioner/ironic/clients/client.goRepository: openshift/baremetal-operator
Length of output: 304
🏁 Script executed:
# Find baselineVersionString definition
rg -n 'baselineVersionString\s*=' pkg/provisioner/ironic/clients/client.go -B2 -A2
# Also check the constants file
rg -n 'const.*Version' pkg/provisioner/ironic/clients/ -B1 -A1Repository: openshift/baremetal-operator
Length of output: 54
🏁 Script executed:
# Read the beginning of client.go to find baselineVersionString
head -30 pkg/provisioner/ironic/clients/client.go
# Also check imports and constants
rg -n 'baselineVersionString' pkg/provisioner/ironic/clients/client.goRepository: openshift/baremetal-operator
Length of output: 1005
🏁 Script executed:
# Search all files in the package for baselineVersionString definition
rg -n 'baselineVersionString' pkg/provisioner/ironic/ -A1 -B1
# Check if it's defined in features.go or elsewhere
fd '\.go$' pkg/provisioner/ironic/clients | xargs grep -l 'baselineVersionString'Repository: openshift/baremetal-operator
Length of output: 1227
🏁 Script executed:
# Find the baseline constant definition
rg -n 'const.*baseline\|baseline\s*=' pkg/provisioner/ironic/clients/features.go -B2 -A2
# Get broader context around baseline and baselineVersionString
sed -n '1,30p' pkg/provisioner/ironic/clients/features.goRepository: openshift/baremetal-operator
Length of output: 1082
Global switch to 1.109 impacts all Ironic node operations on the shared client.
When HasHealthAPI() is true (MaxVersion >= 109), ChooseMicroversion() returns "1.109" instead of the conservative baseline "1.89". This global microversion is assigned to the shared client at pkg/provisioner/ironic/dependencies.go:21 via p.client.Microversion = p.availableFeatures.ChooseMicroversion(), affecting all subsequent node operations (Create, Get, Update, ChangeProvisionState, Delete, SetRAIDConfig, ListBIOSSettings, etc.), not just health reads. A 20-microversion jump introduces risk of behavior drift in non-health flows. Consider keeping baseline 1.89 globally and using 1.109 only on the health-read path (or a dedicated client).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/provisioner/ironic/clients/features.go` around lines 65 - 67,
ChooseMicroversion() currently returns "1.109" when af.HasHealthAPI() is true,
which causes p.client.Microversion (set from
availableFeatures.ChooseMicroversion()) to be bumped globally and affect all
node operations; change the behavior so ChooseMicroversion() returns the
conservative baseline "1.89" for the shared client and only use "1.109" for
health reads by creating a dedicated health client or temporarily cloning the
client and setting its Microversion in the health-read path (e.g., around the
health read call/site), ensuring other methods like Create, Get, Update,
ChangeProvisionState, Delete, SetRAIDConfig, ListBIOSSettings continue using the
baseline microversion.
Three bugs prevented the Healthy condition from working correctly on provisioned hosts:
computeConditions was only called inside the actResult.Dirty() block. For steady-state provisioned hosts where nothing changes during reconciliation, conditions were never recomputed. This meant the Healthy condition (populated from Ironic's health monitoring API) stayed at its initial Unknown value forever, since health status changes externally in Ironic and doesn't trigger state machine transitions.
Empty health from GetHealth (e.g. when the node is not yet registered) was treated as "unknown" via the default switch case, overwriting a previously valid Healthy=True on every idle reconcile. Now empty health means "no data, leave the existing condition unchanged" — only non-empty values from Ironic update the condition.
conditionsBefore used a slice reference sharing the same backing array as host.Status.Conditions. In-place modifications by conditions.Set were visible through both references, making reflect.DeepEqual always return true. Deep copy the slice before comparing so condition-only changes trigger a status save.
Also suppress ErrNeedsRegistration log noise in GetHealth, which fires on every reconcile before the node is registered in Ironic.
Assisted-By: Claude Opus 4.6
(cherry picked from commit 0f32c4f)