Karpenter v1: Node Provisioning, Consolidation, and Drift
Karpenter v1 (stable API, cluster-autoscaler replacement) provisions nodes in response to pending pods, then continuously consolidates by replacing underutilized nodes with smaller ones. The key operational concepts are NodePool (what nodes to provision), EC2NodeClass (how to configure them), and disruption budgets (how aggressively to consolidate without affecting workloads).

Cluster autoscaler scales node groups up when pods are pending and down when nodes are underutilized — but it works at the node group level, meaning you pre-define instance types and the autoscaler picks from that pool. Karpenter works differently: it looks at each pending pod's requirements (CPU, memory, zone, GPU, spot/on-demand) and selects the optimal instance type at provisioning time. The result is better bin-packing (fewer wasted vCPUs) and faster scale-up (direct EC2 API calls, no ASG).
Karpenter v1 (karpenter.sh/v1) stabilized the API. The pre-v1 v1beta1 resources (Provisioner) are removed — this post covers the v1 API exclusively.
Installation
1# Karpenter requires a service account with EC2 and SQS permissions (Pod Identity)
2# See Karpenter docs for the full IAM policy
3
4helm install karpenter oci://public.ecr.aws/karpenter/karpenter \
5 --version 1.3.3 \
6 --namespace kube-system \
7 --create-namespace \
8 --values karpenter-values.yaml1# karpenter-values.yaml
2settings:
3 clusterName: my-cluster
4 clusterEndpoint: "https://XXXXXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com"
5 interruptionQueue: my-cluster-karpenter # SQS queue name for spot interruption events
6
7serviceAccount:
8 annotations:
9 eks.amazonaws.com/role-arn: "" # Pod Identity handles this — leave empty
10 # With EKS Pod Identity, no annotation needed; Karpenter uses the Pod Identity binding
11
12replicas: 2 # HA for production
13
14controller:
15 resources:
16 requests:
17 cpu: 1
18 memory: 1Gi
19
20tolerations:
21 - key: CriticalAddonsOnly
22 operator: ExistsEC2NodeClass
EC2NodeClass defines how AWS nodes are configured — AMI, subnets, security groups, instance profile:
1apiVersion: karpenter.k8s.aws/v1
2kind: EC2NodeClass
3metadata:
4 name: default
5spec:
6 # AMI family: AL2023 is recommended (replaces AL2)
7 amiFamily: AL2023
8
9 # AMI selection by alias (latest AL2023 for the cluster's K8s version)
10 amiSelectorTerms:
11 - alias: al2023@latest
12
13 # Subnet selection by tag (must match your VPC subnet tags)
14 subnetSelectorTerms:
15 - tags:
16 kubernetes.io/cluster/my-cluster: owned
17 karpenter.sh/discovery: my-cluster
18
19 # Security group selection
20 securityGroupSelectorTerms:
21 - tags:
22 karpenter.sh/discovery: my-cluster
23
24 # Instance profile (must have the worker node IAM role)
25 instanceProfile: KarpenterNodeInstanceProfile-my-cluster
26
27 # User data additions (AL2023 uses nodeadm format)
28 userData: |
29 apiVersion: node.eks.aws/v1alpha1
30 kind: NodeConfig
31 spec:
32 kubelet:
33 config:
34 maxPods: 110
35 systemReserved:
36 cpu: 100m
37 memory: 100Mi
38
39 # Block device configuration
40 blockDeviceMappings:
41 - deviceName: /dev/xvda
42 ebs:
43 volumeSize: 50Gi
44 volumeType: gp3
45 iops: 3000
46 throughput: 125
47 encrypted: true
48 deleteOnTermination: true
49
50 # Tags applied to EC2 instances
51 tags:
52 team: platform
53 managed-by: karpenterNodePool
NodePool defines scheduling constraints and disruption behavior — what workloads this pool can run and how aggressively to consolidate:
1apiVersion: karpenter.sh/v1
2kind: NodePool
3metadata:
4 name: default
5spec:
6 template:
7 metadata:
8 labels:
9 node-type: general
10 spec:
11 nodeClassRef:
12 group: karpenter.k8s.aws
13 kind: EC2NodeClass
14 name: default
15
16 requirements:
17 # Instance categories (m, c, r families for general workloads)
18 - key: karpenter.k8s.aws/instance-category
19 operator: In
20 values: ["m", "c", "r"]
21
22 # Exclude small instances (not cost-effective for our workloads)
23 - key: karpenter.k8s.aws/instance-size
24 operator: NotIn
25 values: ["nano", "micro", "small"]
26
27 # Prefer Spot, fall back to On-Demand
28 - key: karpenter.sh/capacity-type
29 operator: In
30 values: ["spot", "on-demand"]
31
32 # Multi-AZ for availability
33 - key: topology.kubernetes.io/zone
34 operator: In
35 values: ["us-east-1a", "us-east-1b", "us-east-1c"]
36
37 # AMD64 only (unless you explicitly want ARM)
38 - key: kubernetes.io/arch
39 operator: In
40 values: ["amd64"]
41
42 # Kubernetes version compatibility
43 - key: karpenter.k8s.aws/instance-generation
44 operator: Gt
45 values: ["2"] # Prefer current generation instances
46
47 # Limits: maximum resources this NodePool can provision
48 limits:
49 cpu: 1000 # 1000 vCPUs max across all nodes in this pool
50 memory: 4000Gi
51
52 # Disruption: how Karpenter consolidates nodes
53 disruption:
54 consolidationPolicy: WhenEmptyOrUnderutilized
55 # consolidateAfter is only valid with WhenEmpty, not WhenEmptyOrUnderutilized
56
57 # Disruption budgets: limits on simultaneous disruptions
58 budgets:
59 - nodes: "10%" # Max 10% of nodes disrupted simultaneously (general)
60 # During business hours, be more conservative
61 - nodes: "5%"
62 schedule: "0 9 * * MON-FRI" # 9 AM Mon-Fri UTC
63 duration: 8hSpot Fallback with Multiple Instance Families
Karpenter handles Spot interruptions by provisioning from a flexible instance family list. The wider the list, the more Spot capacity is available:
1apiVersion: karpenter.sh/v1
2kind: NodePool
3metadata:
4 name: spot-general
5spec:
6 template:
7 spec:
8 nodeClassRef:
9 group: karpenter.k8s.aws
10 kind: EC2NodeClass
11 name: default
12
13 requirements:
14 # Wide instance family selection for maximum Spot availability
15 - key: karpenter.k8s.aws/instance-category
16 operator: In
17 values: ["m", "c", "r", "t"]
18
19 - key: karpenter.k8s.aws/instance-generation
20 operator: Gt
21 values: ["4"]
22
23 - key: karpenter.sh/capacity-type
24 operator: In
25 values: ["spot"] # Spot only in this pool
26
27 - key: karpenter.k8s.aws/instance-cpu
28 operator: In
29 values: ["4", "8", "16"] # Allow specific sizes for predictable bin-packing
30
31 disruption:
32 consolidationPolicy: WhenEmptyOrUnderutilized
33 # consolidateAfter is only valid with WhenEmpty, not WhenEmptyOrUnderutilized
34
35 limits:
36 cpu: 500When a Spot node receives a 2-minute interruption notice, Karpenter sees the SQS message (from the EventBridge/SQS queue configured during install), cordons and drains the node before termination, and provisions a replacement on another Spot instance type automatically.
GPU NodePool
1apiVersion: karpenter.k8s.aws/v1
2kind: EC2NodeClass
3metadata:
4 name: gpu
5spec:
6 amiFamily: AL2023
7 amiSelectorTerms:
8 - alias: al2023@latest
9 subnetSelectorTerms:
10 - tags:
11 karpenter.sh/discovery: my-cluster
12 securityGroupSelectorTerms:
13 - tags:
14 karpenter.sh/discovery: my-cluster
15 instanceProfile: KarpenterNodeInstanceProfile-my-cluster
16 blockDeviceMappings:
17 - deviceName: /dev/xvda
18 ebs:
19 volumeSize: 200Gi # Larger disk for container images
20 volumeType: gp3
21 encrypted: true
22 deleteOnTermination: true
23
24---
25apiVersion: karpenter.sh/v1
26kind: NodePool
27metadata:
28 name: gpu
29spec:
30 template:
31 metadata:
32 labels:
33 node-type: gpu
34 spec:
35 nodeClassRef:
36 group: karpenter.k8s.aws
37 kind: EC2NodeClass
38 name: gpu
39
40 taints:
41 - key: nvidia.com/gpu
42 effect: NoSchedule
43
44 requirements:
45 - key: karpenter.k8s.aws/instance-category
46 operator: In
47 values: ["p", "g"] # GPU instance families
48
49 - key: karpenter.sh/capacity-type
50 operator: In
51 values: ["on-demand"] # GPU Spot availability is limited
52
53 expireAfter: 720h # Replace GPU nodes every 30 days (driver refresh)
54
55 limits:
56 cpu: 400
57 nvidia.com/gpu: 32 # Cap GPU count to limit costs
58
59 disruption:
60 consolidationPolicy: WhenEmpty # Only consolidate empty GPU nodes
61 consolidateAfter: 1mDrift Detection
Karpenter automatically detects when nodes have "drifted" from the NodePool or EC2NodeClass spec and replaces them. Common drift causes:
- New AMI released for the configured
amiFamily - EC2NodeClass
blockDeviceMappingschanged - NodePool
requirementsupdated (new instance families added/removed) - Kubernetes version upgrade (cluster version changed)
When drift is detected, nodes are replaced following the same disruption budget constraints as consolidation:
1# Force node replacement by deleting NodeClaims — Karpenter provisions replacements
2kubectl get nodeclaims -o name | xargs kubectl delete
3
4# Or cordon existing nodes to trigger replacement via consolidation
5kubectl get nodes -l karpenter.sh/nodepool -o name | xargs kubectl cordon
6
7# Check node drift status (nodes marked drifted get replaced within disruption budget)
8kubectl get nodeclaims
9kubectl get nodeclaims -o json | jq '.items[] | {name: .metadata.name, disruption: .metadata.annotations["karpenter.sh/disruption-reason"]}'Pinning AMI to Prevent Unintended Drift
If you need to control exactly when AMI updates happen (not let Karpenter update on its own schedule):
spec:
amiSelectorTerms:
# Pin to specific AMI by ID — no automatic drift on new AMI
- id: ami-0abcdef1234567890Or use a specific SSM parameter path that you update manually:
spec:
amiSelectorTerms:
- alias: al2023@v20240101 # Specific version, not @latestProtecting Workloads from Disruption
# Prevent Karpenter from disrupting a specific pod (e.g., during a migration)
metadata:
annotations:
karpenter.sh/do-not-disrupt: "true"# Prevent Karpenter from disrupting a node (e.g., during incident investigation)
metadata:
annotations:
karpenter.sh/do-not-disrupt: "true"Combined with PodDisruptionBudgets, this gives you layered protection: PDBs prevent voluntary disruptions at the Kubernetes layer; do-not-disrupt prevents Karpenter's node-level actions.
Karpenter Metrics
1# Karpenter exposes Prometheus metrics on port 8080
2# Key metrics:
3
4# Pending pods waiting for nodes (should resolve quickly)
5karpenter_pods_state{phase="Pending"}
6
7# Nodes by lifecycle (launched, registered, initialized)
8karpenter_nodes_total{lifecycle="launched"}
9
10# Disruptions (consolidations, expirations, drift)
11karpenter_disruption_disruptions_total{method="consolidation"}
12karpenter_disruption_disruptions_total{method="drift"}
13karpenter_disruption_disruptions_total{method="expiration"}
14
15# Time from pod pending to node ready (provisioning latency)
16karpenter_nodes_time_to_first_schedule_secondsFrequently Asked Questions
How does Karpenter handle mixed Spot/On-Demand workloads?
Use two NodePools: one for Spot (with broader instance family requirements for capacity), one for On-Demand (for workloads that can't be interrupted). Workloads opt into On-Demand with a nodeSelector or toleration. The default NodePool uses values: ["spot", "on-demand"] and Karpenter picks Spot when available.
Can I use Karpenter alongside cluster-autoscaler?
Running both simultaneously is not recommended — they compete to provision/deprovision nodes and create race conditions. Migrate fully to Karpenter. For existing managed node groups, keep them around for system components (CoreDNS, Karpenter itself), but let Karpenter handle all application workloads.
What happens to Karpenter nodes during an EKS upgrade?
Karpenter nodes don't upgrade in-place. After upgrading the EKS control plane and addons, the nodes need to be replaced. Set the cluster version, then Karpenter detects drift on existing nodes and replaces them with new ones running the updated kubelet. This happens automatically within the disruption budget constraints — no manual cordon/drain needed.
For FinOps patterns that use Karpenter's cost data alongside Kubecost, see Kubernetes Cost Optimization and FinOps. For EKS upgrade procedures where Karpenter node replacement is the node upgrade strategy, see EKS Cluster Upgrades: Zero-Downtime Strategy.
Replacing cluster-autoscaler with Karpenter on an existing EKS cluster? Talk to us at Coding Protocols — we help platform teams migrate to Karpenter incrementally and tune NodePool constraints for their workload mix.


