Skip to content

Question about node taints with regard to doks-managed 'coredns' deployment #31

@taufik-rama

Description

@taufik-rama

Hello 👋

I've a bug report/question about the title:

I've recently created a new cluster on DO, with following Terraform configuration

resource "digitalocean_kubernetes_cluster" "main" {
  # ...
  node_pool {
    # ...
    labels     = {}
    tags       = []
    taint {
      key    = "x-resource-kind"
      value  = "apps"
      effect = "NoSchedule"
    }
  }
}

resource "digitalocean_kubernetes_node_pool" "pool-main-storages" {
  # ...
  labels     = {}
  tags       = []
  taint {
    key    = "x-resource-kind"
    value  = "storages"
    effect = "NoSchedule"
  }
}

Basically I want the new nodes spawned to automatically be given a taint, since I want to control my current/future pods resources for internal usages. The clusters & node pools are created fine, and so is the taint

captain@glados:~$ kubectl describe nodes pool-main-fv5zb
# ...
Taints:             x-resource-kind=apps:NoSchedule
# ...

But I noticed that one of the deployments are not running (coredns)

captain@glados:~$ kubectl get deployment -n kube-system
NAME              READY   UP-TO-DATE   AVAILABLE   AGE
cilium-operator   1/1     1            1           10h
coredns           0/2     2            0           10h

captain@glados:~$ kubectl get pods -n kube-system
NAME                              READY   STATUS    RESTARTS   AGE
cilium-operator-98d97cdf6-phw2j   1/1     Running   0          10h
cilium-plbv2                      1/1     Running   0          10h
coredns-575d7877bb-9sxdl          0/1     Pending   0          10h
coredns-575d7877bb-pwjtl          0/1     Pending   0          10h
cpc-bridge-proxy-hl55s            1/1     Running   0          10h
konnectivity-agent-dcgsg          1/1     Running   0          10h
kube-proxy-zfn9p                  1/1     Running   0          10h

captain@glados:~$ kubectl describe pod/coredns-575d7877bb-9sxdl -n kube-system
# ...
Events:
  Type     Reason             Age                      From                Message
  ----     ------             ----                     ----                -------
  Warning  FailedScheduling   31m (x118 over 10h)      default-scheduler   0/1 nodes are available: 1 node(s) had untolerated taint {x-resource-kind: apps}. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
  Normal   NotTriggerScaleUp  2m10s (x431 over 7h16m)  cluster-autoscaler  pod didn't trigger scale-up: 1 node(s) had untolerated taint {x-resource-kind: apps}

Is this expected? From the logs I understand why it didn't trigger the scale up, it's just that I don't know whether this is the proper behaviour or not.

It's also that other kube-system pods/deployments are running fine, I think because the tolerations are set up to "always tolerate everything"

captain@glados:~$ kubectl describe pod/cilium-plbv2 -n kube-system
# ...
Tolerations:                 op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
# ...

versus

captain@glados:~$ kubectl describe pod/coredns-575d7877bb-9sxdl -n kube-system
# ...
Tolerations:                 CriticalAddonsOnly op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
# ...

As per the reference

If this is expected, you can close this issue. If not then the default deployment might need to be adjusted maybe? Though I don't know whether this will affect others

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions