Kubernetes Node Autoscaling: Cluster Autoscaler vs Karpenter
Cluster Autoscaler scales node groups based on pending pods. Karpenter provisions individual nodes based on pod requirements and costs without pre-defined node groups. The difference matters for heterogeneous workloads: Karpenter can provision a single right-sized node for a batch job in 30-60 seconds, while Cluster Autoscaler is bound to pre-defined Auto Scaling Group configurations. This covers Karpenter setup on EKS, NodePool and EC2NodeClass configuration, consolidation, and the migration path from Cluster Autoscaler.

Cluster Autoscaler has been the default Kubernetes node autoscaler since 2016. It works by watching for pending pods (pods that can't be scheduled because no node has capacity), finding a node group that could fit them, and scaling that group up. The limitation: it can only provision nodes of the types defined in existing Auto Scaling Groups. You define the shapes in advance; the autoscaler picks from them.
Karpenter inverts this. It reads the pod's resource requirements and scheduling constraints (node selectors, affinities, tolerations), evaluates the AWS EC2 instance fleet, and provisions the optimal instance type directly via the EC2 Fleet API. No pre-defined node groups, no ASG configuration. Pod scheduling requirements drive instance selection.
Karpenter Installation on EKS
Prerequisites: IAM and Service Account
1# Set cluster variables
2export CLUSTER_NAME=production
3export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
4export AWS_REGION=us-east-1
5
6# Create the Karpenter node IAM role (EC2 instances need this to join the cluster)
7aws cloudformation deploy \
8 --stack-name KarpenterNodeRole \
9 --template-file karpenter-node-role.yaml \
10 --capabilities CAPABILITY_NAMED_IAM
11
12# Create the Karpenter controller IAM role with Pod Identity association
13aws eks create-pod-identity-association \
14 --cluster-name $CLUSTER_NAME \
15 --namespace kube-system \
16 --service-account karpenter \
17 --role-arn arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterControllerHelm Installation
1# Karpenter distributes via OCI — no helm repo add needed
2# Check https://github.com/aws/karpenter-provider-aws/releases for latest version
3export KARPENTER_VERSION=1.1.0
4
5helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
6 --namespace kube-system \
7 --create-namespace \
8 --version ${KARPENTER_VERSION} \
9 --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterController \
10 --set settings.clusterName=${CLUSTER_NAME} \
11 --set settings.interruptionQueue=KarpenterInterruption \
12 --set controller.resources.requests.cpu=1 \
13 --set controller.resources.requests.memory=1Gi \
14 --set controller.resources.limits.cpu=1 \
15 --set controller.resources.limits.memory=1GiThe interruptionQueue is an SQS queue that receives Spot interruption notices and EC2 health events. Karpenter drains nodes gracefully when interruption events arrive — before AWS terminates the instance.
NodePool: Defining Provisioning Constraints
NodePool replaces the old Provisioner CRD (pre-Karpenter 1.0). It defines what nodes Karpenter can provision and the constraints:
1apiVersion: karpenter.sh/v1
2kind: NodePool
3metadata:
4 name: default
5spec:
6 template:
7 metadata:
8 labels:
9 managed-by: karpenter
10 spec:
11 nodeClassRef:
12 group: karpenter.k8s.aws
13 kind: EC2NodeClass
14 name: default
15
16 # Taints to prevent unintended workloads from landing on Karpenter nodes
17 # (Remove if you want all pods to be eligible)
18 # taints:
19 # - key: karpenter.sh/nodepool
20 # effect: NoSchedule
21
22 requirements:
23 # Accept both On-Demand and Spot
24 - key: karpenter.sh/capacity-type
25 operator: In
26 values: ["spot", "on-demand"]
27
28 # Instance families: general purpose + compute optimized
29 - key: node.kubernetes.io/instance-type
30 operator: In
31 values:
32 - m6i.large
33 - m6i.xlarge
34 - m6i.2xlarge
35 - m6a.large
36 - m6a.xlarge
37 - c6i.large
38 - c6i.xlarge
39 - c6i.2xlarge
40
41 # Restrict to specific AZs (match your EBS volumes)
42 - key: topology.kubernetes.io/zone
43 operator: In
44 values: ["us-east-1a", "us-east-1b", "us-east-1c"]
45
46 # Only amd64 — remove if ARM (Graviton) is acceptable
47 - key: kubernetes.io/arch
48 operator: In
49 values: ["amd64"]
50
51 # Disruption: node consolidation behavior
52 disruption:
53 consolidationPolicy: WhenEmptyOrUnderutilized # Remove underutilized nodes
54 # consolidateAfter is only valid with WhenEmpty, not WhenEmptyOrUnderutilized
55 expireAfter: 720h # Recycle nodes every 30 days (forces image/patch updates)
56
57 # Resource limits: cap on total CPU/memory Karpenter will provision
58 limits:
59 cpu: "1000" # 1000 vCPUs maximum
60 memory: 4000GiexpireAfter is a critical operational setting — it forces node replacement on a schedule, ensuring nodes get fresh AMIs and security patches without requiring manual recycling.
EC2NodeClass: AWS-Specific Configuration
1apiVersion: karpenter.k8s.aws/v1
2kind: EC2NodeClass
3metadata:
4 name: default
5spec:
6 # AMI selection: alias selects the AMI family and version — do not use amiFamily alongside alias
7 amiSelectorTerms:
8 - alias: al2023@latest # AL2023 EKS-optimized AMI, latest patch for the cluster version
9
10 # Subnet selection: Karpenter provisions nodes in matching subnets
11 subnetSelectorTerms:
12 - tags:
13 karpenter.sh/discovery: production # Tag your private subnets with this
14
15 # Security group selection: attach matching security groups to new nodes
16 securityGroupSelectorTerms:
17 - tags:
18 karpenter.sh/discovery: production
19
20 # Node IAM role: EC2 instances assume this role
21 role: KarpenterNodeRole-production
22
23 # Block device: encrypt root volume
24 blockDeviceMappings:
25 - deviceName: /dev/xvda
26 ebs:
27 volumeSize: 50Gi
28 volumeType: gp3
29 iops: 3000
30 throughput: 125
31 encrypted: trueTag your private subnets and the cluster security group with karpenter.sh/discovery: production so Karpenter can discover them:
1# Tag subnets
2aws ec2 create-tags \
3 --resources subnet-0a1b2c3d subnet-0e5f6a7b subnet-0c8d9e0f \
4 --tags Key=karpenter.sh/discovery,Value=production
5
6# Tag the cluster security group
7aws ec2 create-tags \
8 --resources sg-0a1b2c3d4e5f67890 \
9 --tags Key=karpenter.sh/discovery,Value=productionSpot Instance Handling
Karpenter handles Spot interruptions via SQS:
1# Create the SQS queue for interruption events
2aws sqs create-queue --queue-name KarpenterInterruption
3
4# Subscribe the queue to EC2 Spot interruption events
5aws events put-rule \
6 --name KarpenterSpotInterruption \
7 --event-pattern '{"source":["aws.ec2"],"detail-type":["EC2 Spot Instance Interruption Warning","EC2 Instance Rebalance Recommendation","EC2 Instance State-change Notification"]}'
8
9aws events put-targets \
10 --rule KarpenterSpotInterruption \
11 --targets "Id=1,Arn=arn:aws:sqs:${AWS_REGION}:${AWS_ACCOUNT_ID}:KarpenterInterruption"When Karpenter receives a 2-minute Spot interruption warning, it immediately begins draining the node (cordoning it and evicting pods with respect to PodDisruptionBudgets) and provisions a replacement node in parallel. The result: Spot interruptions cause no service disruption for workloads with minAvailable ≥ 2 in their PDB.
For stateful workloads that shouldn't run on Spot:
# Force on-demand for database nodes
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"] # Only on-demand for this NodePoolMigrating from Cluster Autoscaler
Running both simultaneously during migration is safe — they don't conflict:
1# 1. Install Karpenter alongside Cluster Autoscaler
2# 2. Create NodePool/EC2NodeClass
3# 3. Label new pods to use Karpenter nodes (optional — or let it fill naturally)
4
5# 4. When ready, scale down Cluster Autoscaler
6kubectl scale deployment cluster-autoscaler \
7 --replicas=0 \
8 -n kube-system
9
10# 5. Remove node group labels/annotations that prevent Karpenter from managing them
11# (If your ASG nodes have: cluster-autoscaler.kubernetes.io/safe-to-evict: "false")Migration gotchas:
- Karpenter does not manage nodes in existing Auto Scaling Groups — it creates standalone EC2 instances. You still need to scale ASG node groups down manually after migration.
- Pod annotations
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"block Karpenter consolidation too — review and remove where possible. - Karpenter respects PodDisruptionBudgets during consolidation and node recycling.
Frequently Asked Questions
How does Karpenter pick instance types?
Karpenter evaluates the pending pod's resource requests, scheduling requirements (node selector, affinity, tolerations), and the allowed instance types in the NodePool. It prices each viable option (Spot price + On-Demand fallback) and selects the least expensive instance that fits. Providing a broad list of instance families (not just one type) gives Karpenter the flexibility to find available Spot capacity — specifying only m5.xlarge means it can't fall back to m5a.xlarge or m6i.xlarge when m5.xlarge Spot is unavailable.
Should I run one NodePool or multiple?
One default NodePool covers most workloads. Add additional NodePools for: GPU workloads (separate instance family requirements, different taints), Spot vs On-Demand segregation (database nodes must be On-Demand), ARM/Graviton nodes (separate arch requirements). Keep the number of NodePools small — each one adds management overhead and can fragment bin-packing.
What's the consolidation difference between Karpenter 1.0 and earlier versions?
Karpenter 1.0 (released November 2024) replaced the Provisioner CRD with NodePool and EC2NodeClass. The consolidationPolicy: WhenEmptyOrUnderutilized in NodePool replaces the old consolidation.enabled: true on the Provisioner. The behavior is similar, but 1.0 adds disruption.budgets to cap how many nodes can be disrupted simultaneously — important for large clusters where aggressive consolidation can cause scheduling churn.
For HPA and VPA that scale pods (which then trigger Karpenter to scale nodes), see Kubernetes HPA and VPA: Horizontal and Vertical Pod Autoscaling. For cost optimization patterns that use Karpenter's Spot support, see Kubernetes Cost Optimization and FinOps.
Migrating from Cluster Autoscaler to Karpenter or tuning Karpenter for a production EKS cluster? Talk to us at Coding Protocols — we help platform teams configure Karpenter NodePools that balance cost, availability, and scheduling efficiency.


