Kubernetes CoreDNS: Configuration, ndots Tuning, and Production Optimization (2026)

Every pod in a Kubernetes cluster sends DNS queries to CoreDNS. When a pod tries to reach api.stripe.com, it doesn't send one DNS query — with the default ndots:5 setting in resolv.conf, it sends one query per search domain plus one for the bare name. On EKS there are 4 search domains, so the total is 5 queries:

api.stripe.com.payments.svc.cluster.local
api.stripe.com.svc.cluster.local
api.stripe.com.cluster.local
api.stripe.com.us-east-1.compute.internal (EC2 search domain on EKS)
api.stripe.com.

The first four return NXDOMAIN. Only the fifth resolves. Every external API call from every pod multiplies DNS query volume by five, and every NXDOMAIN requires a round-trip to CoreDNS, which then forwards to the upstream resolver (Route53 Resolver on EKS). This is why DNS latency spikes during load tests and why CoreDNS becomes a bottleneck before most teams expect it.

CoreDNS Architecture in Kubernetes

CoreDNS runs as a Deployment (typically 2 replicas) behind a ClusterIP Service at the .10 address of the service CIDR — usually 10.96.0.10 or 10.100.0.10. Every pod's /etc/resolv.conf points there:

nameserver 10.96.0.10
search payments.svc.cluster.local svc.cluster.local cluster.local ec2.internal
options ndots:5

CoreDNS configuration is in a ConfigMap (coredns in kube-system), which contains a Corefile — a zone-based configuration in CoreDNS's own DSL.

The Default Corefile

bash

kubectl get configmap coredns -n kube-system -o yaml

The default Corefile on EKS:

corefile

1.:53 {
2    errors
3    health {
4       lameduck 5s
5    }
6    ready
7    kubernetes cluster.local in-addr.arpa ip6.arpa {
8       pods insecure
9       fallthrough in-addr.arpa ip6.arpa
10       ttl 30
11    }
12    prometheus :9153
13    forward . /etc/resolv.conf {
14       max_concurrent 1000
15    }
16    cache 30
17    loop
18    reload
19    loadbalance
20}

Key plugins:

kubernetes — serves .cluster.local queries from the Kubernetes API (Service, Pod records). pods insecure enables pod hostname resolution without strict MAC verification.
forward — forwards non-cluster queries upstream. /etc/resolv.conf means CoreDNS uses the node's resolver (Route53 Resolver on EKS, typically 169.254.169.253).
cache — caches responses for 30 seconds. This is the positive TTL; negative caching (NXDOMAIN) defaults to the same value.

The ndots Problem and autopath

ndots:5 means: if the queried name has fewer than 5 dots, treat it as a relative name and try all search suffixes first. api.stripe.com has 2 dots — less than 5 — so it triggers the 5-query chain described in the intro.

Fix 1: autopath Plugin

The autopath plugin makes CoreDNS do the search suffix expansion server-side, returning an authoritative response immediately instead of forcing the client to try each suffix:

corefile

1.:53 {
2    errors
3    health {
4       lameduck 5s
5    }
6    ready
7    kubernetes cluster.local in-addr.arpa ip6.arpa {
8       pods insecure
9       fallthrough in-addr.arpa ip6.arpa
10       ttl 30
11    }
12    autopath @kubernetes    # Server-side search suffix expansion
13    prometheus :9153
14    forward . /etc/resolv.conf {
15       max_concurrent 1000
16    }
17    cache 30
18    loop
19    reload
20    loadbalance
21}

@kubernetes tells autopath to use the pod's namespace for search suffix derivation. With autopath enabled, a pod querying an internal service name (e.g., api from the payments namespace) gets a single round-trip to CoreDNS, which resolves it as api.payments.svc.cluster.local server-side and returns a CNAME immediately. For external names like api.stripe.com, autopath still eliminates the client-side round-trips for each failed suffix — it does the suffix expansion internally, but the network reduction is the key benefit.

Tradeoff: autopath increases CoreDNS CPU usage because it does the search expansion work. It's the right fix for high-query-rate workloads where NXDOMAIN latency dominates.

Fix 2: Lower ndots in Pod dnsConfig

For new services you control, lower ndots at the pod level. This reduces unnecessary suffix searches:

yaml

1spec:
2  dnsConfig:
3    options:
4      - name: ndots
5        value: "2"    # Only apply suffixes if the name has < 2 dots
6      - name: single-request-reopen
7        value: ""     # Avoid A/AAAA race condition on some resolvers

ndots:2 means fully-qualified names like api.stripe.com (2 dots — not less than 2) are sent directly without suffix expansion. Internal cluster names like payments-api (0 dots) still get expanded. This is the lowest-risk fix: reduce ndots on workloads that make many external DNS lookups.

Negative Caching for NXDOMAIN

The default cache 30 caches both successful responses and NXDOMAIN responses for 30 seconds. For external names that don't exist in the cluster domain, those five NXDOMAIN responses are cached — but only for 30 seconds. Increasing the negative TTL reduces load from repeated lookups:

corefile

cache {
    success 3600     # Cache positive (A/AAAA) responses for 1 hour
    denial 300       # Cache NXDOMAIN for 5 minutes (default is 30s)
    prefetch 10      # Prefetch entries that have been requested at least 10 times AND have ≤10% TTL remaining
}

prefetch 10 is worth enabling: the prefetch plugin triggers when an entry has been requested at least N times AND has 10% or less of its TTL remaining (default percentage). Both conditions must be met. For example, with a 3600-second TTL, CoreDNS proactively refreshes an entry when it has ≤360 seconds left — but only if it has been queried at least 10 times. This prevents a cache stampede when a popular entry expires and many pods simultaneously query it.

Custom Forward Zones: On-Premises DNS

For clusters that need to resolve on-premises or split-horizon DNS names, configure stub zones that forward specific domains to internal resolvers:

corefile

1.:53 {
2    errors
3    health { lameduck 5s }
4    ready
5    kubernetes cluster.local in-addr.arpa ip6.arpa {
6       pods insecure
7       fallthrough in-addr.arpa ip6.arpa
8       ttl 30
9    }
10    prometheus :9153
11    forward . /etc/resolv.conf {
12       max_concurrent 1000
13    }
14    cache 30
15    loop
16    reload
17    loadbalance
18}
19
20# Forward internal.codingprotocols.com to on-premises DNS servers
21internal.codingprotocols.com:53 {
22    errors
23    forward . 10.0.1.53 10.0.2.53 {
24        policy round_robin
25    }
26    cache 300
27}

The second zone block internal.codingprotocols.com:53 catches all queries ending in that domain and forwards them to the specified nameservers. Queries for everything else go to the default .:53 block. Apply the updated ConfigMap and CoreDNS reloads automatically (the reload plugin watches for ConfigMap changes):

bash

kubectl apply -f coredns-configmap.yaml
# CoreDNS detects the change and reloads within ~30 seconds (reload plugin interval)

Scaling CoreDNS for High-Traffic Clusters

The default 2 CoreDNS replicas handle moderate traffic, but high-pod-count clusters or clusters with many external DNS lookups can saturate them.

Horizontal Scaling

bash

kubectl scale deployment coredns --replicas=4 -n kube-system

Spread replicas across nodes with anti-affinity to avoid single-node failures:

yaml

1# Patch coredns Deployment
2spec:
3  template:
4    spec:
5      affinity:
6        podAntiAffinity:
7          preferredDuringSchedulingIgnoredDuringExecution:
8            - weight: 100
9              podAffinityTerm:
10                labelSelector:
11                  matchLabels:
12                    k8s-app: kube-dns
13                topologyKey: kubernetes.io/hostname

NodeLocal DNSCache

NodeLocal DNSCache runs a DNS caching agent as a DaemonSet on every node, intercepting DNS traffic before it leaves the node. This eliminates the network round-trip to CoreDNS for cached entries and reduces CoreDNS load by serving ~90% of queries locally:

The official NodeLocal DNSCache manifest (nodelocaldns.yaml) ships with template variables (__PILLAR__DNS__SERVER__, __PILLAR__LOCAL__DNS__, __PILLAR__DNS__DOMAIN__) that must be substituted before applying. Download the manifest from the Kubernetes addons directory pinned to your cluster version, substitute the variables for your cluster's DNS IP, local cache IP (169.254.20.10), and domain (cluster.local), then apply. See the upstream documentation for the exact substitution steps — the raw template cannot be applied directly.

With NodeLocal DNSCache deployed, resolv.conf in each pod points to 169.254.20.10 instead of the CoreDNS ClusterIP. Cache misses forward to CoreDNS. This is the right architecture for clusters with more than ~50 nodes.

Monitoring CoreDNS

CoreDNS exposes Prometheus metrics on :9153/metrics. Key metrics for production monitoring:

promql

1# DNS request rate by response code
2rate(coredns_dns_requests_total[5m])
3
4# Error rate — SERVFAIL responses (upstream unavailable, config errors)
5rate(coredns_dns_responses_total{rcode="SERVFAIL"}[5m])
6
7# Cache hits (CoreDNS does not expose a cache_misses counter — track absolute hit rate instead)
8rate(coredns_cache_hits_total{type="success"}[5m])
9
10# Upstream forward latency (time CoreDNS waits for the upstream resolver)
11histogram_quantile(0.99, rate(coredns_forward_request_duration_seconds_bucket[5m]))

Alert on SERVFAIL spikes (upstream DNS resolver unreachable) and on forward latency p99 exceeding 500ms — both indicate that DNS is becoming a bottleneck for your service-to-service calls.

Frequently Asked Questions

Why does my service resolve in some pods but not others?

The most common cause is a timing issue during pod startup: the pod starts, makes a DNS query, but CoreDNS hasn't yet received the EndpointSlice update for the new Service. Wait and retry logic in your application handles this. If the problem persists, check whether the pod is using the correct resolv.conf (should have the CoreDNS ClusterIP as nameserver) and whether the Service exists in the correct namespace (kubectl get svc -n <namespace>).

Should I use `ndots:2` everywhere?

Not unconditionally. ndots:2 breaks short-name Service resolution. A pod that queries payments-api (no dots) still gets expanded via search domains. But payments-api.svc (one dot — less than 2) also gets expanded, while payments-api.payments.svc.cluster.local (four dots — not less than 2) is sent directly. Stick to fully-qualified service names in your application config (payments-api.payments.svc.cluster.local) if you want to avoid ndots gotchas entirely, and set ndots:2 on pods that make many external API calls.

Does CoreDNS support DNS-over-TLS (DoT) or DNSSEC?

CoreDNS supports both. Enable DoT upstream forwarding to avoid plaintext DNS to the upstream resolver:

corefile

forward . tls://8.8.8.8 tls://8.8.4.4 {
    tls_servername dns.google
    health_check 5s
}

DNSSEC validation is enabled with the dnssec plugin. For internal cluster DNS, DNSSEC is rarely worth the overhead — the threat model for cluster-internal DNS is different from public internet DNS.

For CoreDNS production operations — scaling with HPA and cluster-proportional autoscaler, NodeLocal DNSCache deployment, debugging DNS failures, and resource sizing — see CoreDNS in Production: Scaling, Tuning, and Debugging Kubernetes DNS. For the network policy layer that controls which pods can reach CoreDNS (a common source of DNS failures after applying zero-trust policies), see Kubernetes Network Policies: Zero-Trust Networking. For Cilium's DNS-based egress policy that intercepts CoreDNS responses to enforce FQDN-based network policy, see Cilium: eBPF-Powered Networking and Security for Kubernetes.

Tuning CoreDNS for a high-traffic EKS cluster or debugging DNS resolution failures across namespaces? Talk to us at Coding Protocols — we help platform teams resolve the DNS bottlenecks that appear at production scale but are invisible in development.

Kubernetes DNS: CoreDNS Configuration and Tuning

CoreDNS Architecture in Kubernetes

The Default Corefile

The ndots Problem and autopath

Fix 1: autopath Plugin

Fix 2: Lower ndots in Pod dnsConfig

Negative Caching for NXDOMAIN

Custom Forward Zones: On-Premises DNS

Scaling CoreDNS for High-Traffic Clusters

Horizontal Scaling

NodeLocal DNSCache

Monitoring CoreDNS

Frequently Asked Questions

Why does my service resolve in some pods but not others?

Should I use `ndots:2` everywhere?

Does CoreDNS support DNS-over-TLS (DoT) or DNSSEC?

Related Topics

Read Next

Kubernetes Resource Management: Requests, Limits, QoS, and LimitRanges

Kubernetes HPA and VPA: Horizontal and Vertical Pod Autoscaling

Kubernetes Resource Management: Quotas, LimitRanges, and QoS Classes

CoreDNS Architecture in Kubernetes

The Default Corefile

The ndots Problem and autopath

Fix 1: autopath Plugin

Fix 2: Lower ndots in Pod dnsConfig

Negative Caching for NXDOMAIN

Custom Forward Zones: On-Premises DNS

Scaling CoreDNS for High-Traffic Clusters

Horizontal Scaling

NodeLocal DNSCache

Monitoring CoreDNS

Frequently Asked Questions

Why does my service resolve in some pods but not others?

Should I use ndots:2 everywhere?

Does CoreDNS support DNS-over-TLS (DoT) or DNSSEC?

Related Topics

Read Next

Kubernetes Resource Management: Requests, Limits, QoS, and LimitRanges

Kubernetes HPA and VPA: Horizontal and Vertical Pod Autoscaling

Kubernetes Resource Management: Quotas, LimitRanges, and QoS Classes

Should I use `ndots:2` everywhere?