-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Hello 👋
I've a bug report/question about the title:
I've recently created a new cluster on DO, with following Terraform configuration
resource "digitalocean_kubernetes_cluster" "main" {
# ...
node_pool {
# ...
labels = {}
tags = []
taint {
key = "x-resource-kind"
value = "apps"
effect = "NoSchedule"
}
}
}
resource "digitalocean_kubernetes_node_pool" "pool-main-storages" {
# ...
labels = {}
tags = []
taint {
key = "x-resource-kind"
value = "storages"
effect = "NoSchedule"
}
}Basically I want the new nodes spawned to automatically be given a taint, since I want to control my current/future pods resources for internal usages. The clusters & node pools are created fine, and so is the taint
captain@glados:~$ kubectl describe nodes pool-main-fv5zb
# ...
Taints: x-resource-kind=apps:NoSchedule
# ...
But I noticed that one of the deployments are not running (coredns)
captain@glados:~$ kubectl get deployment -n kube-system
NAME READY UP-TO-DATE AVAILABLE AGE
cilium-operator 1/1 1 1 10h
coredns 0/2 2 0 10h
captain@glados:~$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
cilium-operator-98d97cdf6-phw2j 1/1 Running 0 10h
cilium-plbv2 1/1 Running 0 10h
coredns-575d7877bb-9sxdl 0/1 Pending 0 10h
coredns-575d7877bb-pwjtl 0/1 Pending 0 10h
cpc-bridge-proxy-hl55s 1/1 Running 0 10h
konnectivity-agent-dcgsg 1/1 Running 0 10h
kube-proxy-zfn9p 1/1 Running 0 10h
captain@glados:~$ kubectl describe pod/coredns-575d7877bb-9sxdl -n kube-system
# ...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 31m (x118 over 10h) default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {x-resource-kind: apps}. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
Normal NotTriggerScaleUp 2m10s (x431 over 7h16m) cluster-autoscaler pod didn't trigger scale-up: 1 node(s) had untolerated taint {x-resource-kind: apps}
Is this expected? From the logs I understand why it didn't trigger the scale up, it's just that I don't know whether this is the proper behaviour or not.
It's also that other kube-system pods/deployments are running fine, I think because the tolerations are set up to "always tolerate everything"
captain@glados:~$ kubectl describe pod/cilium-plbv2 -n kube-system
# ...
Tolerations: op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
# ...
versus
captain@glados:~$ kubectl describe pod/coredns-575d7877bb-9sxdl -n kube-system
# ...
Tolerations: CriticalAddonsOnly op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
# ...
As per the reference
If this is expected, you can close this issue. If not then the default deployment might need to be adjusted maybe? Though I don't know whether this will affect others