forked from hellofresh/eks-rolling-update
-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Labels
bugSomething isn't workingSomething isn't working
Description
When an eks-rolling-update job failes, the previous cluster state is not automatically recovered, instead requiring manual intervention:
│ iris-devops-rolling-node-update-manual-42b-hrvhd 2023-10-09 10:25:13,138 INFO InstanceId i-026bce300ffa7d8d0 is node ip-10-208-33-228.eu-central-1.compute.internal in kubernetes land │
│ iris-devops-rolling-node-update-manual-42b-hrvhd 2023-10-09 10:25:13,138 INFO Draining worker node with kubectl drain ip-10-208-33-228.eu-central-1.compute.internal --ignore-daemonsets --delete-emptydir-data --timeout=300s... │
│ iris-devops-rolling-node-update-manual-42b-hrvhd node/ip-10-208-33-228.eu-central-1.compute.internal already cordoned │
│ iris-devops-rolling-node-update-manual-42b-hrvhd error: unable to drain node "ip-10-208-33-228.eu-central-1.compute.internal" due to error:cannot delete Pods declare no controller (use --force to override): gitlab-runner/runner-8e3ydb │
│ hn-project-1998-concurrent-8-ek1wq7qg, continuing command... │
│ iris-devops-rolling-node-update-manual-42b-hrvhd There are pending nodes to be drained: │
│ iris-devops-rolling-node-update-manual-42b-hrvhd ip-10-208-33-228.eu-central-1.compute.internal │
│ iris-devops-rolling-node-update-manual-42b-hrvhd cannot delete Pods declare no controller (use --force to override): gitlab-runner/runner-8e3ydbhn-project-1998-concurrent-8-ek1wq7qg │
│ iris-devops-rolling-node-update-manual-42b-hrvhd 2023-10-09 10:25:13,990 INFO Node not drained properly. Exiting │
│ iris-devops-rolling-node-update-manual-42b-hrvhd 2023-10-09 10:25:13,990 ERROR ('Rolling update on ASG failed', 'ci-runner-kas-20230710121010942300000012') │
│ iris-devops-rolling-node-update-manual-42b-hrvhd 2023-10-09 10:25:13,990 ERROR *** Rolling update of ASG has failed. Exiting *** │
│ iris-devops-rolling-node-update-manual-42b-hrvhd 2023-10-09 10:25:13,990 ERROR AWS Auto Scaling Group processes will need resuming manually │
│ iris-devops-rolling-node-update-manual-42b-hrvhd 2023-10-09 10:25:13,990 ERROR Kubernetes Cluster Autoscaler will need resuming manually │
Most notably, auto-scaling will be scaled to 0. This is an issue as our workloads (especially CI) heavily depend on functioning auto-scaling.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working