fix node removal#810
Conversation
| # no exception should occur | ||
| flow.run_tasks(res, tasks) | ||
|
|
||
| def test_remove_control_plane_if_offline(self): |
There was a problem hiding this comment.
I am not sure what this test does. I tried to revert above changes and test still passes, so it does not check new changes. If this test is not relevant, I suggest to remove it so that we have less tests to maintain
There was a problem hiding this comment.
I think you are right. After deep investigation I confirmed both make_group_from_roles and get_unchanged_nodes().having_roles() always produce the same group at LIGHT stage — calculate_nodegroups excludes the removed node from nodes['all'] before this check runs.
I had tested this issue on miniha cluster earlier. I will investigate it more deeper on FullHA cluster.
alexarefev
left a comment
There was a problem hiding this comment.
From my prospective it's quite weird case when the control-plane node is unreachable and it's needed to remove it from the cluster.
Description
remove_nodeprocedure fails immediately withKME0006when the control-plane node being removed is offline or powered off not reachable.check_nodes_accessibilitymust only validate nodes that are staying in the cluster, not the ones being removed.Solution
Replaced
make_group_from_roles(['control-plane', 'balancer'])withget_unchanged_nodes().having_roles(['control-plane', 'balancer'])incheck_nodes_accessibility().get_unchanged_nodes()returns all nodes minus those being added or removed,How to apply
NA
Test Cases
TestCase 1
Test Configuration:
Steps:
procedure.yamlundernodes.kubemarine remove_node procedure.yaml.Results:
KME0006: Nodes ['x.x.x.x'] are not reachableChecklist
Unit tests
test_remove_control_plane_if_offline— new test that marks a control-plane node as offline, sets it as the node to remove, and asserts noKME0006is raised duringrun_tasks.