Conversation
Delete the old raid script script to avoid further confusion.
We now have full raid controller details available after redfish inspection so there is no need to probe the BMC for them. We also update the business logic to group disks by size, to avoid building RAID arrays out of mismatched disks.
Without this, we hit a problem with the iDRAC/PERC job:
RealTimeNoRebootConfiguration stuck at 1%. Ironic never gets to its
normal async RAID polling path because Sushy is still inside
volume.delete() when it times out.
The node would go to clean failed, and Ironic gave a somewhat misleading
error message: Node ea2cdf3f-c868-42a4-b47f-5b782717b349 failed step
{'interface': 'raid', 'step': 'delete_configuration', 'abortable':
False, 'priority': 0}: Unable to connect to
/redfish/v1/TaskService/Tasks/JID_773174628107. Error: Timeout waiting
for task monitor /redfish/v1/TaskService/Tasks/JID_773174628107 (timeout
= 500)
I suspect the jobs is stuck because it powered on the server and
attempted to boot from the volume being deleted, but that is just a
guess. Either way, getting rid of the disable_ramdisk makes it work.
| PhysicalDisk( | ||
| id=disk["id"], | ||
| controller=controller_id, | ||
| size_gb=disk["size"] // 10**9, |
There was a problem hiding this comment.
is there any realistic scenario where the disk["size"] or disk["id"] is None? if yes, consider skipping that iteration and logging the warning
There was a problem hiding this comment.
Honestly this data is brand new, and I have not seen the code that generates it, so anything could happen.
If the data is not in the expected / advertised format though, is it not better for enrol to fail and expose the problem, rather than build raid arrays with missing devices?
There was a problem hiding this comment.
Fair point - it's better to fail early, I just would prefer to have a log message explaining why rather than TypeError: unsupported operand type(s) for //: 'NoneType' and 'int'
There was a problem hiding this comment.
I'll improve this as a follow-on
We now have full raid controller details available after redfish
inspection so there is no need to probe the BMC for them.
We also update the business logic to group disks by size, to avoid
building RAID arrays out of mismatched disks. This behaviour is
ported from #1978
I deleted the old raid script script to avoid further confusion.
This also fixes a RAID delete problem.