Kubernetes Taints and Tolerations: Controlling Pod Placement
Taints repel pods from nodes. Tolerations allow specific pods to bypass those repulsions. Together they give you precise control over which workloads run on which nodes — GPU nodes, spot nodes, dedicated nodes, and nodes reserved for system components.
Before you begin
- kubectl configured against a running cluster
- Basic understanding of Kubernetes Pods, Nodes, and Deployments
- Cluster-admin access to taint nodes
By default, Kubernetes will schedule any pod on any node that has enough CPU and memory. Taints let you mark a node so that only pods with a specific toleration can land on it. This is how you enforce node isolation — GPU nodes for ML workloads only, spot nodes only for batch jobs, system nodes reserved for monitoring infrastructure.
The Core Model
A taint is attached to a node. It has three parts:
key=value:effect
- key and value are arbitrary labels (value is optional)
- effect controls what happens to pods that don't tolerate the taint:
NoSchedule— new pods without the toleration won't be scheduled herePreferNoSchedule— the scheduler tries to avoid this node but may still use itNoExecute— new pods won't be scheduled AND existing pods without the toleration are evicted
A toleration on a pod says: "I can tolerate the taint with this key/value/effect."
Adding a Taint to a Node
1# Taint a node
2kubectl taint nodes node1 dedicated=gpu:NoSchedule
3
4# Verify
5kubectl describe node node1 | grep Taint
6# Taints: dedicated=gpu:NoSchedule
7
8# Remove a taint (note the trailing dash)
9kubectl taint nodes node1 dedicated=gpu:NoSchedule-To taint all nodes in a node group (useful when adding a new node group):
1# List nodes by label (e.g., node group label on EKS)
2kubectl get nodes -l eks.amazonaws.com/nodegroup=gpu-nodes
3
4# Taint each one, or use a label selector via a loop
5for node in $(kubectl get nodes -l eks.amazonaws.com/nodegroup=gpu-nodes -o name); do
6 kubectl taint $node dedicated=gpu:NoSchedule
7doneOn managed node groups (EKS, GKE, AKS), prefer setting taints in the node group configuration so new nodes in the group are tainted automatically. Managing taints manually means new nodes start untainted until you notice.
Adding a Toleration to a Pod
1apiVersion: v1
2kind: Pod
3metadata:
4 name: gpu-job
5spec:
6 tolerations:
7 - key: "dedicated"
8 operator: "Equal"
9 value: "gpu"
10 effect: "NoSchedule"
11 containers:
12 - name: trainer
13 image: pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime
14 resources:
15 limits:
16 nvidia.com/gpu: 1The toleration says: "I can tolerate the dedicated=gpu:NoSchedule taint." This pod will now be scheduled on the GPU node.
Toleration Operators
| Operator | Behaviour |
|---|---|
Equal | Key and value must match exactly |
Exists | Key must exist, value is ignored |
1# Exists — tolerate any taint with key "dedicated", any value, NoSchedule effect
2tolerations:
3 - key: "dedicated"
4 operator: "Exists"
5 effect: "NoSchedule"
6
7# Wildcard — tolerate everything (effect omitted means all effects)
8tolerations:
9 - operator: "Exists"The wildcard toleration is what system pods like kube-proxy and the CNI DaemonSet use — they need to run on every node regardless of taints.
The Three Effects in Practice
NoSchedule — Node Isolation
Use when you want hard isolation: only pods that explicitly tolerate the taint can land here.
kubectl taint nodes spot-node-1 spot=true:NoSchedulePods without a spot=true:NoSchedule toleration will never be scheduled on spot-node-1. Existing pods already running there are not affected.
PreferNoSchedule — Soft Preference
Use when you'd prefer pods avoid a node but it's not a hard requirement. The scheduler will try to find other nodes first.
kubectl taint nodes node1 experimental=true:PreferNoScheduleUseful for gradually migrating workloads off a node, or for nodes with degraded hardware that still function.
NoExecute — Eviction
Use when you want to evict existing pods in addition to blocking new ones. This is also what Kubernetes uses internally for node conditions.
kubectl taint nodes node1 maintenance=true:NoExecuteAll pods without a matching toleration are evicted immediately. Pods with the matching toleration stay, but you can control how long:
tolerations:
- key: "maintenance"
operator: "Equal"
value: "true"
effect: "NoExecute"
tolerationSeconds: 300 # pod is evicted after 5 minutesBuilt-in NoExecute Taints
Kubernetes automatically applies NoExecute taints to nodes in bad states:
| Taint | Condition |
|---|---|
node.kubernetes.io/not-ready | Node not ready |
node.kubernetes.io/unreachable | Node unreachable |
node.kubernetes.io/memory-pressure | Node under memory pressure |
node.kubernetes.io/disk-pressure | Node under disk pressure |
node.kubernetes.io/network-unavailable | Node network unavailable |
By default, Kubernetes adds tolerations for not-ready:NoExecute and unreachable:NoExecute with tolerationSeconds: 300 to every pod. This gives a 5-minute window before a pod is evicted from a failed node. You can override this:
tolerations:
- key: "node.kubernetes.io/not-ready"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 60 # evict faster for latency-sensitive servicesCommon Patterns
Dedicated GPU Nodes
# Taint the GPU node group
kubectl taint nodes -l node.kubernetes.io/instance-type=p3.2xlarge gpu=true:NoSchedule1# Only GPU workloads tolerate the taint
2spec:
3 tolerations:
4 - key: "gpu"
5 operator: "Equal"
6 value: "true"
7 effect: "NoSchedule"
8 resources:
9 limits:
10 nvidia.com/gpu: 1Non-GPU pods that land on a GPU node waste expensive GPU capacity. This taint prevents that.
Spot Instance Isolation
Run batch jobs and non-critical workloads on spot/preemptible nodes while keeping critical services on on-demand nodes.
kubectl taint nodes -l node-type=spot spot=true:NoSchedule1# Batch job — tolerates spot
2spec:
3 tolerations:
4 - key: "spot"
5 operator: "Equal"
6 value: "true"
7 effect: "NoSchedule"Critical services have no spot toleration — they'll never be scheduled on spot nodes.
System/Infrastructure Nodes
Reserve specific nodes for monitoring, ingress, or other platform infrastructure:
kubectl taint nodes infra-node-1 role=infra:NoSchedule1# Prometheus, Grafana, ingress controllers — add the toleration + nodeSelector
2spec:
3 tolerations:
4 - key: "role"
5 operator: "Equal"
6 value: "infra"
7 effect: "NoSchedule"
8 nodeSelector:
9 role: infraToleration alone isn't enough — it allows a pod to schedule on the tainted node but doesn't force it there. Pair with nodeSelector or nodeAffinity to direct the pod specifically to the infra node.
Taints vs Node Affinity
| Feature | Taints + Tolerations | Node Affinity |
|---|---|---|
| Direction | Node repels pods | Pod attracts to nodes |
| Hard/soft | Hard (NoSchedule) or eviction (NoExecute) | required (hard) or preferred (soft) |
| Scope | Any pod without toleration is blocked | Only affects pods with affinity rules |
| Best for | Dedicated nodes, hardware isolation | Topology constraints, zone spread |
Use taints when you want to protect a node from unwanted tenants. Use node affinity when you want to express where a pod should go. In practice, you often use both together — taint the dedicated node AND add node affinity to direct the right pods there.
Verifying Pod Placement
1# Check which node a pod landed on
2kubectl get pod gpu-job -o wide
3
4# Check all pods on a specific node
5kubectl get pods --all-namespaces --field-selector spec.nodeName=node1
6
7# Check why a pod is pending (often taint-related)
8kubectl describe pod gpu-job
9# Look for "Events" section — taint mismatches show up as:
10# 0/3 nodes are available: 3 node(s) had untolerated taint {dedicated: gpu}The describe output is the fastest way to diagnose scheduling failures. The message "N node(s) had untolerated taint" tells you exactly which taint is blocking the pod.
We built Podscape to simplify Kubernetes workflows like this — logs, events, and cluster state in one interface, without switching tools.
Struggling with this in production?
We help teams fix these exact issues. Our engineers have deployed these patterns across production environments at scale.