Kubernetes
14 min readMay 9, 2026

Karpenter v1: Node Provisioning, Consolidation, and Drift

Karpenter v1 (stable API, cluster-autoscaler replacement) provisions nodes in response to pending pods, then continuously consolidates by replacing underutilized nodes with smaller ones. The key operational concepts are NodePool (what nodes to provision), EC2NodeClass (how to configure them), and disruption budgets (how aggressively to consolidate without affecting workloads).

CO
Coding Protocols Team
Platform Engineering
Karpenter v1: Node Provisioning, Consolidation, and Drift

Cluster autoscaler scales node groups up when pods are pending and down when nodes are underutilized — but it works at the node group level, meaning you pre-define instance types and the autoscaler picks from that pool. Karpenter works differently: it looks at each pending pod's requirements (CPU, memory, zone, GPU, spot/on-demand) and selects the optimal instance type at provisioning time. The result is better bin-packing (fewer wasted vCPUs) and faster scale-up (direct EC2 API calls, no ASG).

Karpenter v1 (karpenter.sh/v1) stabilized the API. The pre-v1 v1beta1 resources (Provisioner) are removed — this post covers the v1 API exclusively.


Installation

bash
1# Karpenter requires a service account with EC2 and SQS permissions (Pod Identity)
2# See Karpenter docs for the full IAM policy
3
4helm install karpenter oci://public.ecr.aws/karpenter/karpenter \
5  --version 1.3.3 \
6  --namespace kube-system \
7  --create-namespace \
8  --values karpenter-values.yaml
yaml
1# karpenter-values.yaml
2settings:
3  clusterName: my-cluster
4  clusterEndpoint: "https://XXXXXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com"
5  interruptionQueue: my-cluster-karpenter    # SQS queue name for spot interruption events
6
7serviceAccount:
8  annotations:
9    eks.amazonaws.com/role-arn: ""    # Pod Identity handles this — leave empty
10  # With EKS Pod Identity, no annotation needed; Karpenter uses the Pod Identity binding
11
12replicas: 2    # HA for production
13
14controller:
15  resources:
16    requests:
17      cpu: 1
18      memory: 1Gi
19
20tolerations:
21  - key: CriticalAddonsOnly
22    operator: Exists

EC2NodeClass

EC2NodeClass defines how AWS nodes are configured — AMI, subnets, security groups, instance profile:

yaml
1apiVersion: karpenter.k8s.aws/v1
2kind: EC2NodeClass
3metadata:
4  name: default
5spec:
6  # AMI family: AL2023 is recommended (replaces AL2)
7  amiFamily: AL2023
8
9  # AMI selection by alias (latest AL2023 for the cluster's K8s version)
10  amiSelectorTerms:
11    - alias: al2023@latest
12
13  # Subnet selection by tag (must match your VPC subnet tags)
14  subnetSelectorTerms:
15    - tags:
16        kubernetes.io/cluster/my-cluster: owned
17        karpenter.sh/discovery: my-cluster
18
19  # Security group selection
20  securityGroupSelectorTerms:
21    - tags:
22        karpenter.sh/discovery: my-cluster
23
24  # Instance profile (must have the worker node IAM role)
25  instanceProfile: KarpenterNodeInstanceProfile-my-cluster
26
27  # User data additions (AL2023 uses nodeadm format)
28  userData: |
29    apiVersion: node.eks.aws/v1alpha1
30    kind: NodeConfig
31    spec:
32      kubelet:
33        config:
34          maxPods: 110
35          systemReserved:
36            cpu: 100m
37            memory: 100Mi
38
39  # Block device configuration
40  blockDeviceMappings:
41    - deviceName: /dev/xvda
42      ebs:
43        volumeSize: 50Gi
44        volumeType: gp3
45        iops: 3000
46        throughput: 125
47        encrypted: true
48        deleteOnTermination: true
49
50  # Tags applied to EC2 instances
51  tags:
52    team: platform
53    managed-by: karpenter

NodePool

NodePool defines scheduling constraints and disruption behavior — what workloads this pool can run and how aggressively to consolidate:

yaml
1apiVersion: karpenter.sh/v1
2kind: NodePool
3metadata:
4  name: default
5spec:
6  template:
7    metadata:
8      labels:
9        node-type: general
10    spec:
11      nodeClassRef:
12        group: karpenter.k8s.aws
13        kind: EC2NodeClass
14        name: default
15
16      requirements:
17        # Instance categories (m, c, r families for general workloads)
18        - key: karpenter.k8s.aws/instance-category
19          operator: In
20          values: ["m", "c", "r"]
21
22        # Exclude small instances (not cost-effective for our workloads)
23        - key: karpenter.k8s.aws/instance-size
24          operator: NotIn
25          values: ["nano", "micro", "small"]
26
27        # Prefer Spot, fall back to On-Demand
28        - key: karpenter.sh/capacity-type
29          operator: In
30          values: ["spot", "on-demand"]
31
32        # Multi-AZ for availability
33        - key: topology.kubernetes.io/zone
34          operator: In
35          values: ["us-east-1a", "us-east-1b", "us-east-1c"]
36
37        # AMD64 only (unless you explicitly want ARM)
38        - key: kubernetes.io/arch
39          operator: In
40          values: ["amd64"]
41
42        # Kubernetes version compatibility
43        - key: karpenter.k8s.aws/instance-generation
44          operator: Gt
45          values: ["2"]    # Prefer current generation instances
46
47  # Limits: maximum resources this NodePool can provision
48  limits:
49    cpu: 1000       # 1000 vCPUs max across all nodes in this pool
50    memory: 4000Gi
51
52  # Disruption: how Karpenter consolidates nodes
53  disruption:
54    consolidationPolicy: WhenEmptyOrUnderutilized
55    # consolidateAfter is only valid with WhenEmpty, not WhenEmptyOrUnderutilized
56
57    # Disruption budgets: limits on simultaneous disruptions
58    budgets:
59      - nodes: "10%"    # Max 10% of nodes disrupted simultaneously (general)
60      # During business hours, be more conservative
61      - nodes: "5%"
62        schedule: "0 9 * * MON-FRI"    # 9 AM Mon-Fri UTC
63        duration: 8h

Spot Fallback with Multiple Instance Families

Karpenter handles Spot interruptions by provisioning from a flexible instance family list. The wider the list, the more Spot capacity is available:

yaml
1apiVersion: karpenter.sh/v1
2kind: NodePool
3metadata:
4  name: spot-general
5spec:
6  template:
7    spec:
8      nodeClassRef:
9        group: karpenter.k8s.aws
10        kind: EC2NodeClass
11        name: default
12
13      requirements:
14        # Wide instance family selection for maximum Spot availability
15        - key: karpenter.k8s.aws/instance-category
16          operator: In
17          values: ["m", "c", "r", "t"]
18
19        - key: karpenter.k8s.aws/instance-generation
20          operator: Gt
21          values: ["4"]
22
23        - key: karpenter.sh/capacity-type
24          operator: In
25          values: ["spot"]    # Spot only in this pool
26
27        - key: karpenter.k8s.aws/instance-cpu
28          operator: In
29          values: ["4", "8", "16"]    # Allow specific sizes for predictable bin-packing
30
31  disruption:
32    consolidationPolicy: WhenEmptyOrUnderutilized
33    # consolidateAfter is only valid with WhenEmpty, not WhenEmptyOrUnderutilized
34
35  limits:
36    cpu: 500

When a Spot node receives a 2-minute interruption notice, Karpenter sees the SQS message (from the EventBridge/SQS queue configured during install), cordons and drains the node before termination, and provisions a replacement on another Spot instance type automatically.


GPU NodePool

yaml
1apiVersion: karpenter.k8s.aws/v1
2kind: EC2NodeClass
3metadata:
4  name: gpu
5spec:
6  amiFamily: AL2023
7  amiSelectorTerms:
8    - alias: al2023@latest
9  subnetSelectorTerms:
10    - tags:
11        karpenter.sh/discovery: my-cluster
12  securityGroupSelectorTerms:
13    - tags:
14        karpenter.sh/discovery: my-cluster
15  instanceProfile: KarpenterNodeInstanceProfile-my-cluster
16  blockDeviceMappings:
17    - deviceName: /dev/xvda
18      ebs:
19        volumeSize: 200Gi    # Larger disk for container images
20        volumeType: gp3
21        encrypted: true
22        deleteOnTermination: true
23
24---
25apiVersion: karpenter.sh/v1
26kind: NodePool
27metadata:
28  name: gpu
29spec:
30  template:
31    metadata:
32      labels:
33        node-type: gpu
34    spec:
35      nodeClassRef:
36        group: karpenter.k8s.aws
37        kind: EC2NodeClass
38        name: gpu
39
40      taints:
41        - key: nvidia.com/gpu
42          effect: NoSchedule
43
44      requirements:
45        - key: karpenter.k8s.aws/instance-category
46          operator: In
47          values: ["p", "g"]    # GPU instance families
48
49        - key: karpenter.sh/capacity-type
50          operator: In
51          values: ["on-demand"]    # GPU Spot availability is limited
52
53      expireAfter: 720h    # Replace GPU nodes every 30 days (driver refresh)
54
55  limits:
56    cpu: 400
57    nvidia.com/gpu: 32    # Cap GPU count to limit costs
58
59  disruption:
60    consolidationPolicy: WhenEmpty    # Only consolidate empty GPU nodes
61    consolidateAfter: 1m

Drift Detection

Karpenter automatically detects when nodes have "drifted" from the NodePool or EC2NodeClass spec and replaces them. Common drift causes:

  • New AMI released for the configured amiFamily
  • EC2NodeClass blockDeviceMappings changed
  • NodePool requirements updated (new instance families added/removed)
  • Kubernetes version upgrade (cluster version changed)

When drift is detected, nodes are replaced following the same disruption budget constraints as consolidation:

bash
1# Force node replacement by deleting NodeClaims — Karpenter provisions replacements
2kubectl get nodeclaims -o name | xargs kubectl delete
3
4# Or cordon existing nodes to trigger replacement via consolidation
5kubectl get nodes -l karpenter.sh/nodepool -o name | xargs kubectl cordon
6
7# Check node drift status (nodes marked drifted get replaced within disruption budget)
8kubectl get nodeclaims
9kubectl get nodeclaims -o json | jq '.items[] | {name: .metadata.name, disruption: .metadata.annotations["karpenter.sh/disruption-reason"]}'

Pinning AMI to Prevent Unintended Drift

If you need to control exactly when AMI updates happen (not let Karpenter update on its own schedule):

yaml
spec:
  amiSelectorTerms:
    # Pin to specific AMI by ID — no automatic drift on new AMI
    - id: ami-0abcdef1234567890

Or use a specific SSM parameter path that you update manually:

yaml
spec:
  amiSelectorTerms:
    - alias: al2023@v20240101    # Specific version, not @latest

Protecting Workloads from Disruption

yaml
# Prevent Karpenter from disrupting a specific pod (e.g., during a migration)
metadata:
  annotations:
    karpenter.sh/do-not-disrupt: "true"
yaml
# Prevent Karpenter from disrupting a node (e.g., during incident investigation)
metadata:
  annotations:
    karpenter.sh/do-not-disrupt: "true"

Combined with PodDisruptionBudgets, this gives you layered protection: PDBs prevent voluntary disruptions at the Kubernetes layer; do-not-disrupt prevents Karpenter's node-level actions.


Karpenter Metrics

bash
1# Karpenter exposes Prometheus metrics on port 8080
2# Key metrics:
3
4# Pending pods waiting for nodes (should resolve quickly)
5karpenter_pods_state{phase="Pending"}
6
7# Nodes by lifecycle (launched, registered, initialized)
8karpenter_nodes_total{lifecycle="launched"}
9
10# Disruptions (consolidations, expirations, drift)
11karpenter_disruption_disruptions_total{method="consolidation"}
12karpenter_disruption_disruptions_total{method="drift"}
13karpenter_disruption_disruptions_total{method="expiration"}
14
15# Time from pod pending to node ready (provisioning latency)
16karpenter_nodes_time_to_first_schedule_seconds

Frequently Asked Questions

How does Karpenter handle mixed Spot/On-Demand workloads?

Use two NodePools: one for Spot (with broader instance family requirements for capacity), one for On-Demand (for workloads that can't be interrupted). Workloads opt into On-Demand with a nodeSelector or toleration. The default NodePool uses values: ["spot", "on-demand"] and Karpenter picks Spot when available.

Can I use Karpenter alongside cluster-autoscaler?

Running both simultaneously is not recommended — they compete to provision/deprovision nodes and create race conditions. Migrate fully to Karpenter. For existing managed node groups, keep them around for system components (CoreDNS, Karpenter itself), but let Karpenter handle all application workloads.

What happens to Karpenter nodes during an EKS upgrade?

Karpenter nodes don't upgrade in-place. After upgrading the EKS control plane and addons, the nodes need to be replaced. Set the cluster version, then Karpenter detects drift on existing nodes and replaces them with new ones running the updated kubelet. This happens automatically within the disruption budget constraints — no manual cordon/drain needed.


For FinOps patterns that use Karpenter's cost data alongside Kubecost, see Kubernetes Cost Optimization and FinOps. For EKS upgrade procedures where Karpenter node replacement is the node upgrade strategy, see EKS Cluster Upgrades: Zero-Downtime Strategy.

Replacing cluster-autoscaler with Karpenter on an existing EKS cluster? Talk to us at Coding Protocols — we help platform teams migrate to Karpenter incrementally and tune NodePool constraints for their workload mix.

Related Topics

Karpenter
EKS
Kubernetes
Autoscaling
Cost Optimization
Spot Instances
Platform Engineering
AWS

Read Next