Skip to content

Rolling restart of NodeLocal DNS daemonset causes large spikes in DNS latency, sometimes resulting in a timeout #732

@ian-axelrod-sl

Description

@ian-axelrod-sl

Hi,

I have been testing NodeLocal DNS cache and have noticed that I can reliably cause large spikes in DNS resolution latency each time I perform a rolling restart of the daemonset. Nearly all (perhaps all, actually) of the latency is for external domains, but this may be a red herring due to our applications' use of gRPC channels w/ long-lived connections to communicate with each other.

I was able to 'fix' the issue by reducing DNS timeout via pod dnsConfig. But before I apply this as the official fix, I want to understand if this behavior is expected. Is it possible that there the potential for an application pod to send DNS-resolution-related UDP packets out to the nodelocal pod, the nodelocal pod terminates before it receives the packet (or it does not gracefully shut down?? i'd doubt that), and the response DNS udp packets are never received? That's the only thing I can think of.

Throughput is around 1 DNS query per millisecond per nodelocal pod.

TL;DR: Rolling restarts cause DNS latency spikes. Is this expected behavior for a high throughput system and the only way to combat is adjusting DNS timeouts, or is there an implementation or configuration issue that should be looked into?

Thank you!


Versions:

kubernetes: 1.30
kube-proxy: 1.29
coredns: 1.11.4
nodelocaldns: 1.26.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions