Kubernetes

Kubernetes Taints and Tolerations: Controlling Pod Placement

Intermediate25 min to complete11 min read

Taints repel pods from nodes. Tolerations allow specific pods to bypass those repulsions. Together they give you precise control over which workloads run on which nodes — GPU nodes, spot nodes, dedicated nodes, and nodes reserved for system components.

Before you begin

  • kubectl configured against a running cluster
  • Basic understanding of Kubernetes Pods, Nodes, and Deployments
  • Cluster-admin access to taint nodes
Kubernetes
Scheduling
Taints
Tolerations
Node Affinity
DevOps

By default, Kubernetes will schedule any pod on any node that has enough CPU and memory. Taints let you mark a node so that only pods with a specific toleration can land on it. This is how you enforce node isolation — GPU nodes for ML workloads only, spot nodes only for batch jobs, system nodes reserved for monitoring infrastructure.

The Core Model

A taint is attached to a node. It has three parts:

key=value:effect
  • key and value are arbitrary labels (value is optional)
  • effect controls what happens to pods that don't tolerate the taint:
    • NoSchedule — new pods without the toleration won't be scheduled here
    • PreferNoSchedule — the scheduler tries to avoid this node but may still use it
    • NoExecute — new pods won't be scheduled AND existing pods without the toleration are evicted

A toleration on a pod says: "I can tolerate the taint with this key/value/effect."

Adding a Taint to a Node

bash
1# Taint a node
2kubectl taint nodes node1 dedicated=gpu:NoSchedule
3
4# Verify
5kubectl describe node node1 | grep Taint
6# Taints: dedicated=gpu:NoSchedule
7
8# Remove a taint (note the trailing dash)
9kubectl taint nodes node1 dedicated=gpu:NoSchedule-

To taint all nodes in a node group (useful when adding a new node group):

bash
1# List nodes by label (e.g., node group label on EKS)
2kubectl get nodes -l eks.amazonaws.com/nodegroup=gpu-nodes
3
4# Taint each one, or use a label selector via a loop
5for node in $(kubectl get nodes -l eks.amazonaws.com/nodegroup=gpu-nodes -o name); do
6  kubectl taint $node dedicated=gpu:NoSchedule
7done

On managed node groups (EKS, GKE, AKS), prefer setting taints in the node group configuration so new nodes in the group are tainted automatically. Managing taints manually means new nodes start untainted until you notice.

Adding a Toleration to a Pod

yaml
1apiVersion: v1
2kind: Pod
3metadata:
4  name: gpu-job
5spec:
6  tolerations:
7    - key: "dedicated"
8      operator: "Equal"
9      value: "gpu"
10      effect: "NoSchedule"
11  containers:
12    - name: trainer
13      image: pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime
14      resources:
15        limits:
16          nvidia.com/gpu: 1

The toleration says: "I can tolerate the dedicated=gpu:NoSchedule taint." This pod will now be scheduled on the GPU node.

Toleration Operators

OperatorBehaviour
EqualKey and value must match exactly
ExistsKey must exist, value is ignored
yaml
1# Exists — tolerate any taint with key "dedicated", any value, NoSchedule effect
2tolerations:
3  - key: "dedicated"
4    operator: "Exists"
5    effect: "NoSchedule"
6
7# Wildcard — tolerate everything (effect omitted means all effects)
8tolerations:
9  - operator: "Exists"

The wildcard toleration is what system pods like kube-proxy and the CNI DaemonSet use — they need to run on every node regardless of taints.

The Three Effects in Practice

NoSchedule — Node Isolation

Use when you want hard isolation: only pods that explicitly tolerate the taint can land here.

bash
kubectl taint nodes spot-node-1 spot=true:NoSchedule

Pods without a spot=true:NoSchedule toleration will never be scheduled on spot-node-1. Existing pods already running there are not affected.

PreferNoSchedule — Soft Preference

Use when you'd prefer pods avoid a node but it's not a hard requirement. The scheduler will try to find other nodes first.

bash
kubectl taint nodes node1 experimental=true:PreferNoSchedule

Useful for gradually migrating workloads off a node, or for nodes with degraded hardware that still function.

NoExecute — Eviction

Use when you want to evict existing pods in addition to blocking new ones. This is also what Kubernetes uses internally for node conditions.

bash
kubectl taint nodes node1 maintenance=true:NoExecute

All pods without a matching toleration are evicted immediately. Pods with the matching toleration stay, but you can control how long:

yaml
tolerations:
  - key: "maintenance"
    operator: "Equal"
    value: "true"
    effect: "NoExecute"
    tolerationSeconds: 300   # pod is evicted after 5 minutes

Built-in NoExecute Taints

Kubernetes automatically applies NoExecute taints to nodes in bad states:

TaintCondition
node.kubernetes.io/not-readyNode not ready
node.kubernetes.io/unreachableNode unreachable
node.kubernetes.io/memory-pressureNode under memory pressure
node.kubernetes.io/disk-pressureNode under disk pressure
node.kubernetes.io/network-unavailableNode network unavailable

By default, Kubernetes adds tolerations for not-ready:NoExecute and unreachable:NoExecute with tolerationSeconds: 300 to every pod. This gives a 5-minute window before a pod is evicted from a failed node. You can override this:

yaml
tolerations:
  - key: "node.kubernetes.io/not-ready"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 60   # evict faster for latency-sensitive services

Common Patterns

Dedicated GPU Nodes

bash
# Taint the GPU node group
kubectl taint nodes -l node.kubernetes.io/instance-type=p3.2xlarge gpu=true:NoSchedule
yaml
1# Only GPU workloads tolerate the taint
2spec:
3  tolerations:
4    - key: "gpu"
5      operator: "Equal"
6      value: "true"
7      effect: "NoSchedule"
8  resources:
9    limits:
10      nvidia.com/gpu: 1

Non-GPU pods that land on a GPU node waste expensive GPU capacity. This taint prevents that.

Spot Instance Isolation

Run batch jobs and non-critical workloads on spot/preemptible nodes while keeping critical services on on-demand nodes.

bash
kubectl taint nodes -l node-type=spot spot=true:NoSchedule
yaml
1# Batch job — tolerates spot
2spec:
3  tolerations:
4    - key: "spot"
5      operator: "Equal"
6      value: "true"
7      effect: "NoSchedule"

Critical services have no spot toleration — they'll never be scheduled on spot nodes.

System/Infrastructure Nodes

Reserve specific nodes for monitoring, ingress, or other platform infrastructure:

bash
kubectl taint nodes infra-node-1 role=infra:NoSchedule
yaml
1# Prometheus, Grafana, ingress controllers — add the toleration + nodeSelector
2spec:
3  tolerations:
4    - key: "role"
5      operator: "Equal"
6      value: "infra"
7      effect: "NoSchedule"
8  nodeSelector:
9    role: infra

Toleration alone isn't enough — it allows a pod to schedule on the tainted node but doesn't force it there. Pair with nodeSelector or nodeAffinity to direct the pod specifically to the infra node.

Taints vs Node Affinity

FeatureTaints + TolerationsNode Affinity
DirectionNode repels podsPod attracts to nodes
Hard/softHard (NoSchedule) or eviction (NoExecute)required (hard) or preferred (soft)
ScopeAny pod without toleration is blockedOnly affects pods with affinity rules
Best forDedicated nodes, hardware isolationTopology constraints, zone spread

Use taints when you want to protect a node from unwanted tenants. Use node affinity when you want to express where a pod should go. In practice, you often use both together — taint the dedicated node AND add node affinity to direct the right pods there.

Verifying Pod Placement

bash
1# Check which node a pod landed on
2kubectl get pod gpu-job -o wide
3
4# Check all pods on a specific node
5kubectl get pods --all-namespaces --field-selector spec.nodeName=node1
6
7# Check why a pod is pending (often taint-related)
8kubectl describe pod gpu-job
9# Look for "Events" section — taint mismatches show up as:
10# 0/3 nodes are available: 3 node(s) had untolerated taint {dedicated: gpu}

The describe output is the fastest way to diagnose scheduling failures. The message "N node(s) had untolerated taint" tells you exactly which taint is blocking the pod.

We built Podscape to simplify Kubernetes workflows like this — logs, events, and cluster state in one interface, without switching tools.

Struggling with this in production?

We help teams fix these exact issues. Our engineers have deployed these patterns across production environments at scale.