Skip to content

[DNM] (cluster-bot) Fix Healthy condition not updating from Ironic health data#471

Open
jacob-anders wants to merge 1 commit intoopenshift:mainfrom
jacob-anders:fix-healthy-condition-downstream
Open

[DNM] (cluster-bot) Fix Healthy condition not updating from Ironic health data#471
jacob-anders wants to merge 1 commit intoopenshift:mainfrom
jacob-anders:fix-healthy-condition-downstream

Conversation

@jacob-anders
Copy link
Copy Markdown

Three bugs prevented the Healthy condition from working correctly on provisioned hosts:

  1. computeConditions was only called inside the actResult.Dirty() block. For steady-state provisioned hosts where nothing changes during reconciliation, conditions were never recomputed. This meant the Healthy condition (populated from Ironic's health monitoring API) stayed at its initial Unknown value forever, since health status changes externally in Ironic and doesn't trigger state machine transitions.

  2. Empty health from GetHealth (e.g. when the node is not yet registered) was treated as "unknown" via the default switch case, overwriting a previously valid Healthy=True on every idle reconcile. Now empty health means "no data, leave the existing condition unchanged" — only non-empty values from Ironic update the condition.

  3. conditionsBefore used a slice reference sharing the same backing array as host.Status.Conditions. In-place modifications by conditions.Set were visible through both references, making reflect.DeepEqual always return true. Deep copy the slice before comparing so condition-only changes trigger a status save.

Also suppress ErrNeedsRegistration log noise in GetHealth, which fires on every reconcile before the node is registered in Ironic.

Assisted-By: Claude Opus 4.6

(cherry picked from commit 0f32c4f)

@jacob-anders
Copy link
Copy Markdown
Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 19, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 19, 2026

Walkthrough

Conditions are snapshotted, recomputed, and deep-compared; host status is saved when the state-machine is dirty or conditions changed. HealthyCondition is left unchanged when a provisioner returns empty health. Ironic suppresses logging for registration-needed node errors. Microversion selection prefers 1.109 when supported.

Changes

Cohort / File(s) Summary
Controller condition logic
internal/controller/metal3.io/baremetalhost_controller.go
Snapshots host.GetConditions(), calls computeConditions, sets conditionsChanged via reflect.DeepEqual, and calls r.saveHostStatus when actResult.Dirty() OR conditionsChanged. computeConditions updates HealthyCondition only when the provisioner returns a non-empty health string.
Controller tests
internal/controller/metal3.io/baremetalhost_controller_test.go
Removed test expecting empty health → ConditionUnknown; added subtest asserting empty health leaves HealthyCondition unchanged (nil). Updated TestComputeConditions to expect HealthyCondition absent when a provisioner provides no health data.
Ironic provisioner health and logging
pkg/provisioner/ironic/ironic.go
GetHealth suppresses error logging when p.getNode(ctx) returns provisioner.ErrNeedsRegistration while still returning ""; other errors continue to log and return "".
Ironic client feature flags & microversion selection
pkg/provisioner/ironic/clients/features.go
Added AvailableFeatures.HasHealthAPI() (true when MaxVersion >= 109). AvailableFeatures.Log now records healthAPI. ChooseMicroversion() prefers microversion "1.109" when HasHealthAPI() is true, ahead of existing checks.
Ironic client tests
pkg/provisioner/ironic/clients/features_test.go
Updated TestAvailableFeatures_ChooseMicroversion to use explicit expected microversions and added boundary cases for MaxVersion == 109 and MaxVersion > 109 expecting "1.109". Removed local microVersion variable.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from elfosardo and zaneb March 19, 2026 11:26
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 19, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jacob-anders
Once this PR has been reviewed and has the lgtm label, please assign elfosardo for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/controller/metal3.io/baremetalhost_controller_test.go (1)

3230-3238: Seed an existing HealthyCondition in this regression test.

This currently proves only that empty health does not create a condition on a fresh host. The bug in the PR was overwriting an existing Healthy=True, so the test should start with that condition and assert it survives computeConditions().

Targeted test adjustment
 	t.Run("empty health leaves condition unchanged", func(t *testing.T) {
-		host := bmhWithStatus(metal3api.OperationalStatusOK, metal3api.StateAvailable)
+		host := bmhWithStatus(metal3api.OperationalStatusOK, metal3api.StateProvisioned)
+		conditions.Set(host, metav1.Condition{
+			Type:   metal3api.HealthyCondition,
+			Status: metav1.ConditionTrue,
+			Reason: metal3api.HealthyReason,
+		})
 		fix := &fixture.Fixture{Health: ""}
 		prov, err := fix.NewProvisioner(t.Context(), provisioner.BuildHostData(*host, bmc.Credentials{}), nil)
 		require.NoError(t, err)
 		computeConditions(t.Context(), host, prov)
 
 		cond := conditions.Get(host, metal3api.HealthyCondition)
-		assert.Nil(t, cond, "empty health should not set the Healthy condition")
+		require.NotNil(t, cond)
+		assert.Equal(t, metav1.ConditionTrue, cond.Status)
+		assert.Equal(t, metal3api.HealthyReason, cond.Reason)
 	})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/metal3.io/baremetalhost_controller_test.go` around lines
3230 - 3238, The test "empty health leaves condition unchanged" currently
verifies empty health doesn't create a condition; modify it to seed an existing
Healthy condition on the host before calling computeConditions() so we verify it
isn't overwritten: set metal3api.HealthyCondition = True on the host object (use
conditions.Set or the same helper used elsewhere in tests) prior to calling
computeConditions(host, prov), then after computeConditions() use
conditions.Get(host, metal3api.HealthyCondition) and assert the condition still
exists and remains True (and retains its status/message if applicable).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/controller/metal3.io/baremetalhost_controller.go`:
- Around line 243-250: computeConditions is being called every reconcile and for
the ironic provisioner results in two extra Ironic node reads because
HasPowerFailure() and GetHealth() each call getNode(ctx); fix by obtaining the
node-backed status once per reconcile and passing that snapshot into
computeConditions (or by adding a cached accessor on the ironic provisioner that
does a single getNode(ctx) and reuses the result). Concretely, call getNode(ctx)
once before computeConditions, create a node/status snapshot, and change
computeConditions(host, prov) to accept that snapshot (or add
prov.GetNodeSnapshot(ctx) and have HasPowerFailure/GetHealth use it), ensuring
computeConditions uses the single cached snapshot instead of triggering multiple
getNode calls.

---

Nitpick comments:
In `@internal/controller/metal3.io/baremetalhost_controller_test.go`:
- Around line 3230-3238: The test "empty health leaves condition unchanged"
currently verifies empty health doesn't create a condition; modify it to seed an
existing Healthy condition on the host before calling computeConditions() so we
verify it isn't overwritten: set metal3api.HealthyCondition = True on the host
object (use conditions.Set or the same helper used elsewhere in tests) prior to
calling computeConditions(host, prov), then after computeConditions() use
conditions.Get(host, metal3api.HealthyCondition) and assert the condition still
exists and remains True (and retains its status/message if applicable).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 269eaed4-34cc-4e2f-a2fc-392cc86d30ed

📥 Commits

Reviewing files that changed from the base of the PR and between fb6a6d5 and 9488e3f.

📒 Files selected for processing (3)
  • internal/controller/metal3.io/baremetalhost_controller.go
  • internal/controller/metal3.io/baremetalhost_controller_test.go
  • pkg/provisioner/ironic/ironic.go

Comment on lines +243 to +250
// Always compute conditions since some (e.g. Healthy) can change
// based on external state from Ironic regardless of state machine
// transitions.
existing := host.GetConditions()
conditionsBefore := make([]metav1.Condition, len(existing))
copy(conditionsBefore, existing)
computeConditions(ctx, host, prov)
conditionsChanged := !reflect.DeepEqual(conditionsBefore, host.GetConditions())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This turns condition recomputation into two extra Ironic reads per loop.

computeConditions() now runs on every reconcile, and for the ironic provisioner it calls both HasPowerFailure() and GetHealth(). In pkg/provisioner/ironic/ironic.go, both methods do their own getNode(ctx) lookup, so steady-state hosts pick up two additional node reads every cycle. Please cache the node-backed condition data per reconcile or plumb a single status snapshot into computeConditions. As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/metal3.io/baremetalhost_controller.go` around lines 243 -
250, computeConditions is being called every reconcile and for the ironic
provisioner results in two extra Ironic node reads because HasPowerFailure() and
GetHealth() each call getNode(ctx); fix by obtaining the node-backed status once
per reconcile and passing that snapshot into computeConditions (or by adding a
cached accessor on the ironic provisioner that does a single getNode(ctx) and
reuses the result). Concretely, call getNode(ctx) once before computeConditions,
create a node/status snapshot, and change computeConditions(host, prov) to
accept that snapshot (or add prov.GetNodeSnapshot(ctx) and have
HasPowerFailure/GetHealth use it), ensuring computeConditions uses the single
cached snapshot instead of triggering multiple getNode calls.

@jacob-anders jacob-anders force-pushed the fix-healthy-condition-downstream branch from 9488e3f to 79ce597 Compare March 20, 2026 04:28
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
internal/controller/metal3.io/baremetalhost_controller.go (1)

243-250: ⚠️ Potential issue | 🟠 Major

This now polls Ironic on every clean reconcile.

Moving computeConditions() out of the dirty-only path makes Line 249 part of every idle loop. Line 2314 then calls prov.GetHealth(ctx) whenever a provisioner exists, and the Ironic implementation fetches that through getNode(ctx), so steady-state hosts pick up an extra Ironic read every reconcile. Please cache the node-backed condition data for the reconcile, or expose a single provisioner call that serves the condition checks from one fetch. As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/metal3.io/baremetalhost_controller.go` around lines 243 -
250, computeConditions() is being called on every reconcile causing
prov.GetHealth(ctx) -> getNode(ctx) to hit Ironic on steady-state hosts; fix by
ensuring node-backed condition data is fetched once per reconcile and reused:
either obtain and cache the provisioner node once (call prov.getNode or add a
new prov.GetNodeOnce/GetHealthCached) at the start of Reconcile and pass that
node into computeConditions(host, provNode) (or add an overload
computeConditions(ctx, host, provNode)), or move computeConditions back into the
dirty-only path and have prov.GetHealth invoked only when the host is dirty;
update callers (computeConditions, prov.GetHealth usage and any getNode callers)
to use the cached node/provider method so a single Ironic read services all
condition checks during one reconcile.
🧹 Nitpick comments (1)
internal/controller/metal3.io/baremetalhost_controller_test.go (1)

3230-3239: This misses the actual empty-health regression.

The bug here was overwriting an existing HealthyCondition; starting from a host with no condition only proves we do not create a new one. Seed HealthyCondition=True first and assert it survives unchanged.

Coverage tweak
 t.Run("empty health leaves condition unchanged", func(t *testing.T) {
 	host := bmhWithStatus(metal3api.OperationalStatusOK, metal3api.StateAvailable)
+	conditions.Set(host, metav1.Condition{
+		Type:   metal3api.HealthyCondition,
+		Status: metav1.ConditionTrue,
+		Reason: metal3api.HealthyReason,
+	})
 	fix := &fixture.Fixture{Health: ""}
 	prov, err := fix.NewProvisioner(t.Context(), provisioner.BuildHostData(*host, bmc.Credentials{}), nil)
 	require.NoError(t, err)
 	computeConditions(t.Context(), host, prov)
 
 	cond := conditions.Get(host, metal3api.HealthyCondition)
-	assert.Nil(t, cond, "empty health should not set the Healthy condition")
+	require.NotNil(t, cond)
+	assert.Equal(t, metav1.ConditionTrue, cond.Status)
+	assert.Equal(t, metal3api.HealthyReason, cond.Reason)
 })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/metal3.io/baremetalhost_controller_test.go` around lines
3230 - 3239, The test "empty health leaves condition unchanged" currently
verifies that an absent HealthyCondition isn't created but misses the regression
where an existing HealthyCondition could be overwritten; modify the test to seed
the host with HealthyCondition=True before invoking computeConditions (use
conditions.Set or the existing helper that sets conditions on the host) on the
host returned by bmhWithStatus, then call provisioner.NewProvisioner and
computeConditions as before, and finally assert that conditions.Get(host,
metal3api.HealthyCondition) still indicates True (unchanged) rather than being
cleared or modified.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@internal/controller/metal3.io/baremetalhost_controller.go`:
- Around line 243-250: computeConditions() is being called on every reconcile
causing prov.GetHealth(ctx) -> getNode(ctx) to hit Ironic on steady-state hosts;
fix by ensuring node-backed condition data is fetched once per reconcile and
reused: either obtain and cache the provisioner node once (call prov.getNode or
add a new prov.GetNodeOnce/GetHealthCached) at the start of Reconcile and pass
that node into computeConditions(host, provNode) (or add an overload
computeConditions(ctx, host, provNode)), or move computeConditions back into the
dirty-only path and have prov.GetHealth invoked only when the host is dirty;
update callers (computeConditions, prov.GetHealth usage and any getNode callers)
to use the cached node/provider method so a single Ironic read services all
condition checks during one reconcile.

---

Nitpick comments:
In `@internal/controller/metal3.io/baremetalhost_controller_test.go`:
- Around line 3230-3239: The test "empty health leaves condition unchanged"
currently verifies that an absent HealthyCondition isn't created but misses the
regression where an existing HealthyCondition could be overwritten; modify the
test to seed the host with HealthyCondition=True before invoking
computeConditions (use conditions.Set or the existing helper that sets
conditions on the host) on the host returned by bmhWithStatus, then call
provisioner.NewProvisioner and computeConditions as before, and finally assert
that conditions.Get(host, metal3api.HealthyCondition) still indicates True
(unchanged) rather than being cleared or modified.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 10316ad6-9c82-42fd-962d-3c737657c46b

📥 Commits

Reviewing files that changed from the base of the PR and between 9488e3f and 79ce597.

📒 Files selected for processing (5)
  • internal/controller/metal3.io/baremetalhost_controller.go
  • internal/controller/metal3.io/baremetalhost_controller_test.go
  • pkg/provisioner/ironic/clients/features.go
  • pkg/provisioner/ironic/clients/features_test.go
  • pkg/provisioner/ironic/ironic.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/provisioner/ironic/ironic.go

@jacob-anders jacob-anders force-pushed the fix-healthy-condition-downstream branch from 79ce597 to 91f8548 Compare March 20, 2026 06:10
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
internal/controller/metal3.io/baremetalhost_controller_test.go (1)

3230-3239: Add the regression case that preserves an existing Healthy condition.

This subtest only proves absent→absent. The production bug was a previously set Healthy condition being overwritten during steady-state reconciles, so it’s worth seeding HealthyCondition first and asserting it survives empty health.

Suggested regression case
+	t.Run("empty health preserves existing Healthy condition", func(t *testing.T) {
+		host := bmhWithStatus(metal3api.OperationalStatusOK, metal3api.StateProvisioned)
+		conditions.Set(host, metav1.Condition{
+			Type:   metal3api.HealthyCondition,
+			Status: metav1.ConditionTrue,
+			Reason: metal3api.HealthyReason,
+		})
+		fix := &fixture.Fixture{Health: ""}
+		prov, err := fix.NewProvisioner(t.Context(), provisioner.BuildHostData(*host, bmc.Credentials{}), nil)
+		require.NoError(t, err)
+		computeConditions(t.Context(), host, prov)
+
+		cond := conditions.Get(host, metal3api.HealthyCondition)
+		require.NotNil(t, cond)
+		assert.Equal(t, metav1.ConditionTrue, cond.Status)
+		assert.Equal(t, metal3api.HealthyReason, cond.Reason)
+	})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/metal3.io/baremetalhost_controller_test.go` around lines
3230 - 3239, Add a regression subtest that seeds an existing Healthy condition
on the host before calling computeConditions so empty health does not overwrite
it: in the "empty health leaves condition unchanged" subtest (using
bmhWithStatus and fix.NewProvisioner / computeConditions) set the host's
metal3api.HealthyCondition to a known value (e.g., True with a reason/message)
prior to computeConditions and then assert that the condition still exists and
its status/reason/message are unchanged after computeConditions returns.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@internal/controller/metal3.io/baremetalhost_controller_test.go`:
- Around line 3230-3239: Add a regression subtest that seeds an existing Healthy
condition on the host before calling computeConditions so empty health does not
overwrite it: in the "empty health leaves condition unchanged" subtest (using
bmhWithStatus and fix.NewProvisioner / computeConditions) set the host's
metal3api.HealthyCondition to a known value (e.g., True with a reason/message)
prior to computeConditions and then assert that the condition still exists and
its status/reason/message are unchanged after computeConditions returns.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8d688a2f-e4bd-41a8-a026-eed8c4a8d8f3

📥 Commits

Reviewing files that changed from the base of the PR and between 79ce597 and 91f8548.

📒 Files selected for processing (5)
  • internal/controller/metal3.io/baremetalhost_controller.go
  • internal/controller/metal3.io/baremetalhost_controller_test.go
  • pkg/provisioner/ironic/clients/features.go
  • pkg/provisioner/ironic/clients/features_test.go
  • pkg/provisioner/ironic/ironic.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/provisioner/ironic/clients/features_test.go

Four bugs prevented the Healthy condition from working correctly
on provisioned hosts:

1. computeConditions was only called inside the actResult.Dirty()
   block. For steady-state provisioned hosts where nothing changes
   during reconciliation, conditions were never recomputed. This
   meant the Healthy condition (populated from Ironic's health
   monitoring API) stayed at its initial Unknown value forever,
   since health status changes externally in Ironic and doesn't
   trigger state machine transitions.

2. ChooseMicroversion() capped at 1.95, but the Ironic health
   field on the Node resource requires microversion 1.109. With a
   lower microversion, Ironic does not include the health field in
   responses, so GetHealth always returned an empty string and the
   Healthy condition was never set. Added HasHealthAPI() feature
   detection and bumped ChooseMicroversion() to return 1.109 when
   the Ironic endpoint supports it.

3. Empty health from GetHealth (e.g. when the node is not yet
   registered) was treated as "unknown" via the default switch
   case, overwriting a previously valid Healthy=True on every idle
   reconcile. Now empty health means "no data, leave the existing
   condition unchanged" — only non-empty values from Ironic update
   the condition.

4. conditionsBefore used a slice reference sharing the same backing
   array as host.Status.Conditions. In-place modifications by
   conditions.Set were visible through both references, making
   reflect.DeepEqual always return true. Deep copy the slice before
   comparing so condition-only changes trigger a status save.

Also suppress ErrNeedsRegistration log noise in GetHealth, which
fires on every reconcile before the node is registered in Ironic.

Assisted-By: Claude Opus 4.6
Signed-off-by: Jacob Anders <jacob-anders-dev@proton.me>
@jacob-anders jacob-anders force-pushed the fix-healthy-condition-downstream branch from 91f8548 to 2954a71 Compare March 23, 2026 05:35
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/controller/metal3.io/baremetalhost_controller_test.go (1)

3230-3238: Strengthen this regression test to hit the real failure mode.

This starts from a host with no HealthyCondition, so it only proves empty health doesn't create one. The regression described in the PR was preserving an already-set condition on a steady-state provisioned host. Seed HealthyCondition first, and preferably drive it through Reconcile() once so the condition-only save path is covered too.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/metal3.io/baremetalhost_controller_test.go` around lines
3230 - 3238, The test currently verifies empty health doesn't create a Healthy
condition but misses the regression where an existing Healthy condition should
be preserved; modify the test "empty health leaves condition unchanged" to seed
a Healthy condition on the host before running the provisioner (use
conditions.Set/conditions.MarkTrue on metal3api.HealthyCondition or directly set
host.Status.Conditions) and then drive the host through the controller Reconcile
path once (call the test helper that invokes Reconcile or controller.Reconcile)
before invoking computeConditions/ provisioner to ensure the condition-only save
path is exercised and that the pre-seeded Healthy condition remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/provisioner/ironic/clients/features.go`:
- Around line 65-67: ChooseMicroversion() currently returns "1.109" when
af.HasHealthAPI() is true, which causes p.client.Microversion (set from
availableFeatures.ChooseMicroversion()) to be bumped globally and affect all
node operations; change the behavior so ChooseMicroversion() returns the
conservative baseline "1.89" for the shared client and only use "1.109" for
health reads by creating a dedicated health client or temporarily cloning the
client and setting its Microversion in the health-read path (e.g., around the
health read call/site), ensuring other methods like Create, Get, Update,
ChangeProvisionState, Delete, SetRAIDConfig, ListBIOSSettings continue using the
baseline microversion.

---

Nitpick comments:
In `@internal/controller/metal3.io/baremetalhost_controller_test.go`:
- Around line 3230-3238: The test currently verifies empty health doesn't create
a Healthy condition but misses the regression where an existing Healthy
condition should be preserved; modify the test "empty health leaves condition
unchanged" to seed a Healthy condition on the host before running the
provisioner (use conditions.Set/conditions.MarkTrue on
metal3api.HealthyCondition or directly set host.Status.Conditions) and then
drive the host through the controller Reconcile path once (call the test helper
that invokes Reconcile or controller.Reconcile) before invoking
computeConditions/ provisioner to ensure the condition-only save path is
exercised and that the pre-seeded Healthy condition remains unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 95d4c08b-35c5-4f91-897f-ce9d7ddd5178

📥 Commits

Reviewing files that changed from the base of the PR and between 91f8548 and 2954a71.

📒 Files selected for processing (5)
  • internal/controller/metal3.io/baremetalhost_controller.go
  • internal/controller/metal3.io/baremetalhost_controller_test.go
  • pkg/provisioner/ironic/clients/features.go
  • pkg/provisioner/ironic/clients/features_test.go
  • pkg/provisioner/ironic/ironic.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/provisioner/ironic/ironic.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/provisioner/ironic/clients/features_test.go

Comment on lines +65 to +67
if af.HasHealthAPI() {
return "1.109"
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Confirm global microversion assignment from ChooseMicroversion()
rg -n -C3 'p\.client\.Microversion\s*=\s*p\.availableFeatures\.ChooseMicroversion\(\)' pkg/provisioner/ironic

# 2) List Ironic node operations using the same shared client
rg -n -C2 'nodes\.[A-Za-z_]+\(' pkg/provisioner/ironic

# 3) Check for per-request microversion/header overrides (if any)
rg -n -C2 'X-OpenStack-Ironic-API-Version|OpenStack-API-Version|Microversion' pkg/provisioner/ironic

Repository: openshift/baremetal-operator

Length of output: 16585


🏁 Script executed:

# Find HasHealthAPI() definition and usage
rg -n -B2 -A5 'func.*HasHealthAPI' pkg/provisioner/ironic

# Find health-related operations or API calls
rg -n -i 'health' pkg/provisioner/ironic/clients/features.go

# Check what microversions are involved and their implications
rg -n '1\.10[0-9]' pkg/provisioner/ironic

Repository: openshift/baremetal-operator

Length of output: 1123


🏁 Script executed:

# Search for actual health endpoint/operation calls
rg -n -i 'health' pkg/provisioner/ironic/*.go

# Check if there's any health check method or similar
rg -n 'func.*health|checkHealth|getHealth' pkg/provisioner/ironic -i

# Look for what operations actually require 1.109
rg -n 'GetHealthStatus|health.*status' pkg/provisioner/ironic -i

Repository: openshift/baremetal-operator

Length of output: 2237


🏁 Script executed:

# Get the full implementation of GetHealth()
sed -n '2001,2020p' pkg/provisioner/ironic/ironic.go

# Check what getNode() does (referenced in GetHealth)
rg -n -A10 'func.*getNode\(' pkg/provisioner/ironic/ironic.go | head -30

# Verify if getNode uses the shared client
rg -n 'getNode.*p\.client' pkg/provisioner/ironic/ironic.go

Repository: openshift/baremetal-operator

Length of output: 945


🏁 Script executed:

# Confirm p.client is shared across all methods in the provisioner
rg -n 'p\.client\.' pkg/provisioner/ironic/ironic.go | wc -l

# Check if nodes.Get or any node operation supports per-request microversion override
rg -n 'type.*Opts' pkg/provisioner/ironic --include='*.go' | grep -i 'microversion\|version'

# Look at the client definition to understand if Microversion affects all requests
rg -n 'type.*ServiceClient' pkg/provisioner/ironic -A5

Repository: openshift/baremetal-operator

Length of output: 532


🏁 Script executed:

# Check the p.client field definition
rg -n 'p\.client' pkg/provisioner/ironic/ironic.go -B2 -A2 | head -20

# Look at the provisioner struct definition
rg -n 'type ironicProvisioner struct' pkg/provisioner/ironic/ironic.go -A10

# Check if gophercloud ServiceClient uses global Microversion
rg -n 'ServiceClient' pkg/provisioner/ironic/clients/client.go -B2 -A5

Repository: openshift/baremetal-operator

Length of output: 1640


🏁 Script executed:

# Check if there are comments or documentation about microversion 1.109 compatibility
rg -n '1\.109\|microversion.*compat' pkg/provisioner/ironic -B2 -A2

# Look for any test coverage showing issues with 1.109 on non-health operations
rg -n '1\.109' pkg/provisioner/ironic -l

# Check the baseline/conservative microversion to understand the risk delta
rg -n 'baselineVersionString\|conservative' pkg/provisioner/ironic/clients/client.go -B2 -A2

Repository: openshift/baremetal-operator

Length of output: 162


🏁 Script executed:

# Find baseline/conservative version definition
rg -n 'baselineVersionString\|1\.6\|1\.5' pkg/provisioner/ironic/clients/client.go -B2 -A2

# Get context around client initialization with versions
sed -n '85,105p' pkg/provisioner/ironic/clients/client.go

Repository: openshift/baremetal-operator

Length of output: 304


🏁 Script executed:

# Find baselineVersionString definition
rg -n 'baselineVersionString\s*=' pkg/provisioner/ironic/clients/client.go -B2 -A2

# Also check the constants file
rg -n 'const.*Version' pkg/provisioner/ironic/clients/ -B1 -A1

Repository: openshift/baremetal-operator

Length of output: 54


🏁 Script executed:

# Read the beginning of client.go to find baselineVersionString
head -30 pkg/provisioner/ironic/clients/client.go

# Also check imports and constants
rg -n 'baselineVersionString' pkg/provisioner/ironic/clients/client.go

Repository: openshift/baremetal-operator

Length of output: 1005


🏁 Script executed:

# Search all files in the package for baselineVersionString definition
rg -n 'baselineVersionString' pkg/provisioner/ironic/ -A1 -B1

# Check if it's defined in features.go or elsewhere
fd '\.go$' pkg/provisioner/ironic/clients | xargs grep -l 'baselineVersionString'

Repository: openshift/baremetal-operator

Length of output: 1227


🏁 Script executed:

# Find the baseline constant definition
rg -n 'const.*baseline\|baseline\s*=' pkg/provisioner/ironic/clients/features.go -B2 -A2

# Get broader context around baseline and baselineVersionString
sed -n '1,30p' pkg/provisioner/ironic/clients/features.go

Repository: openshift/baremetal-operator

Length of output: 1082


Global switch to 1.109 impacts all Ironic node operations on the shared client.

When HasHealthAPI() is true (MaxVersion >= 109), ChooseMicroversion() returns "1.109" instead of the conservative baseline "1.89". This global microversion is assigned to the shared client at pkg/provisioner/ironic/dependencies.go:21 via p.client.Microversion = p.availableFeatures.ChooseMicroversion(), affecting all subsequent node operations (Create, Get, Update, ChangeProvisionState, Delete, SetRAIDConfig, ListBIOSSettings, etc.), not just health reads. A 20-microversion jump introduces risk of behavior drift in non-health flows. Consider keeping baseline 1.89 globally and using 1.109 only on the health-read path (or a dedicated client).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/provisioner/ironic/clients/features.go` around lines 65 - 67,
ChooseMicroversion() currently returns "1.109" when af.HasHealthAPI() is true,
which causes p.client.Microversion (set from
availableFeatures.ChooseMicroversion()) to be bumped globally and affect all
node operations; change the behavior so ChooseMicroversion() returns the
conservative baseline "1.89" for the shared client and only use "1.109" for
health reads by creating a dedicated health client or temporarily cloning the
client and setting its Microversion in the health-read path (e.g., around the
health read call/site), ensuring other methods like Create, Get, Update,
ChangeProvisionState, Delete, SetRAIDConfig, ListBIOSSettings continue using the
baseline microversion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant