Rolling restart of NodeLocal DNS daemonset causes large spikes in DNS latency, sometimes resulting in a timeout

Hi,

I have been testing NodeLocal DNS cache and have noticed that I can reliably cause large spikes in DNS resolution latency each time I perform a rolling restart of the daemonset. Nearly all (perhaps all, actually) of the latency is for external domains, but this may be a red herring due to our applications' use of gRPC channels w/ long-lived connections to communicate with each other.

I was able to 'fix' the issue by reducing DNS timeout via pod `dnsConfig`. But before I apply this as the official fix, I want to understand if this behavior is expected. Is it possible that there the potential for an application pod to send DNS-resolution-related UDP packets out to the nodelocal pod, the nodelocal pod terminates before it receives the packet (or it does not gracefully shut down?? i'd doubt that), and the response DNS udp packets are never received? That's the only thing I can think of.

Throughput is around 1 DNS query per millisecond per nodelocal pod.

TL;DR: Rolling restarts cause DNS latency spikes. Is this expected behavior for a high throughput system and the only way to combat is adjusting DNS timeouts, or is there an implementation or configuration issue that should be looked into?

Thank you!

----
Versions:

kubernetes: 1.30
kube-proxy: 1.29
coredns: 1.11.4
nodelocaldns: 1.26.4


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rolling restart of NodeLocal DNS daemonset causes large spikes in DNS latency, sometimes resulting in a timeout #732

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Rolling restart of NodeLocal DNS daemonset causes large spikes in DNS latency, sometimes resulting in a timeout #732

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions