Question about node taints with regard to doks-managed 'coredns' deployment

Hello :wave: 

I've a bug report/question about the title:

I've recently created a new cluster on DO, with following Terraform configuration

```tf
resource "digitalocean_kubernetes_cluster" "main" {
  # ...
  node_pool {
    # ...
    labels     = {}
    tags       = []
    taint {
      key    = "x-resource-kind"
      value  = "apps"
      effect = "NoSchedule"
    }
  }
}

resource "digitalocean_kubernetes_node_pool" "pool-main-storages" {
  # ...
  labels     = {}
  tags       = []
  taint {
    key    = "x-resource-kind"
    value  = "storages"
    effect = "NoSchedule"
  }
}
```

Basically I want the new nodes spawned to automatically be given a taint, since I want to control my current/future pods resources for internal usages. The clusters & node pools are created fine, and so is the taint
```logs
captain@glados:~$ kubectl describe nodes pool-main-fv5zb
# ...
Taints:             x-resource-kind=apps:NoSchedule
# ...
```

But I noticed that one of the deployments are not running (`coredns`)
```logs
captain@glados:~$ kubectl get deployment -n kube-system
NAME              READY   UP-TO-DATE   AVAILABLE   AGE
cilium-operator   1/1     1            1           10h
coredns           0/2     2            0           10h

captain@glados:~$ kubectl get pods -n kube-system
NAME                              READY   STATUS    RESTARTS   AGE
cilium-operator-98d97cdf6-phw2j   1/1     Running   0          10h
cilium-plbv2                      1/1     Running   0          10h
coredns-575d7877bb-9sxdl          0/1     Pending   0          10h
coredns-575d7877bb-pwjtl          0/1     Pending   0          10h
cpc-bridge-proxy-hl55s            1/1     Running   0          10h
konnectivity-agent-dcgsg          1/1     Running   0          10h
kube-proxy-zfn9p                  1/1     Running   0          10h

captain@glados:~$ kubectl describe pod/coredns-575d7877bb-9sxdl -n kube-system
# ...
Events:
  Type     Reason             Age                      From                Message
  ----     ------             ----                     ----                -------
  Warning  FailedScheduling   31m (x118 over 10h)      default-scheduler   0/1 nodes are available: 1 node(s) had untolerated taint {x-resource-kind: apps}. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
  Normal   NotTriggerScaleUp  2m10s (x431 over 7h16m)  cluster-autoscaler  pod didn't trigger scale-up: 1 node(s) had untolerated taint {x-resource-kind: apps}
```

Is this expected? From the logs I understand why it didn't trigger the scale up, it's just that I don't know whether this is the proper behaviour or not.

It's also that other `kube-system` pods/deployments are running fine, I think because the tolerations are set up to "always tolerate everything"

```
captain@glados:~$ kubectl describe pod/cilium-plbv2 -n kube-system
# ...
Tolerations:                 op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
# ...
```

versus

```
captain@glados:~$ kubectl describe pod/coredns-575d7877bb-9sxdl -n kube-system
# ...
Tolerations:                 CriticalAddonsOnly op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
# ...
```

As per the [reference](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/)

_If_ this is expected, you can close this issue. If not then the default deployment might need to be adjusted maybe? Though I don't know whether this will affect others

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about node taints with regard to doks-managed 'coredns' deployment #31

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about node taints with regard to doks-managed 'coredns' deployment #31

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions