Platform Engineering
12 min readMay 8, 2026

Kubernetes Node Autoscaling: Cluster Autoscaler vs Karpenter

Cluster Autoscaler scales node groups based on pending pods. Karpenter provisions individual nodes based on pod requirements and costs without pre-defined node groups. The difference matters for heterogeneous workloads: Karpenter can provision a single right-sized node for a batch job in 30-60 seconds, while Cluster Autoscaler is bound to pre-defined Auto Scaling Group configurations. This covers Karpenter setup on EKS, NodePool and EC2NodeClass configuration, consolidation, and the migration path from Cluster Autoscaler.

CO
Coding Protocols Team
Platform Engineering
Kubernetes Node Autoscaling: Cluster Autoscaler vs Karpenter

Cluster Autoscaler has been the default Kubernetes node autoscaler since 2016. It works by watching for pending pods (pods that can't be scheduled because no node has capacity), finding a node group that could fit them, and scaling that group up. The limitation: it can only provision nodes of the types defined in existing Auto Scaling Groups. You define the shapes in advance; the autoscaler picks from them.

Karpenter inverts this. It reads the pod's resource requirements and scheduling constraints (node selectors, affinities, tolerations), evaluates the AWS EC2 instance fleet, and provisions the optimal instance type directly via the EC2 Fleet API. No pre-defined node groups, no ASG configuration. Pod scheduling requirements drive instance selection.


Karpenter Installation on EKS

Prerequisites: IAM and Service Account

bash
1# Set cluster variables
2export CLUSTER_NAME=production
3export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
4export AWS_REGION=us-east-1
5
6# Create the Karpenter node IAM role (EC2 instances need this to join the cluster)
7aws cloudformation deploy \
8  --stack-name KarpenterNodeRole \
9  --template-file karpenter-node-role.yaml \
10  --capabilities CAPABILITY_NAMED_IAM
11
12# Create the Karpenter controller IAM role with Pod Identity association
13aws eks create-pod-identity-association \
14  --cluster-name $CLUSTER_NAME \
15  --namespace kube-system \
16  --service-account karpenter \
17  --role-arn arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterController

Helm Installation

bash
1# Karpenter distributes via OCI — no helm repo add needed
2# Check https://github.com/aws/karpenter-provider-aws/releases for latest version
3export KARPENTER_VERSION=1.1.0
4
5helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
6  --namespace kube-system \
7  --create-namespace \
8  --version ${KARPENTER_VERSION} \
9  --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterController \
10  --set settings.clusterName=${CLUSTER_NAME} \
11  --set settings.interruptionQueue=KarpenterInterruption \
12  --set controller.resources.requests.cpu=1 \
13  --set controller.resources.requests.memory=1Gi \
14  --set controller.resources.limits.cpu=1 \
15  --set controller.resources.limits.memory=1Gi

The interruptionQueue is an SQS queue that receives Spot interruption notices and EC2 health events. Karpenter drains nodes gracefully when interruption events arrive — before AWS terminates the instance.


NodePool: Defining Provisioning Constraints

NodePool replaces the old Provisioner CRD (pre-Karpenter 1.0). It defines what nodes Karpenter can provision and the constraints:

yaml
1apiVersion: karpenter.sh/v1
2kind: NodePool
3metadata:
4  name: default
5spec:
6  template:
7    metadata:
8      labels:
9        managed-by: karpenter
10    spec:
11      nodeClassRef:
12        group: karpenter.k8s.aws
13        kind: EC2NodeClass
14        name: default
15
16      # Taints to prevent unintended workloads from landing on Karpenter nodes
17      # (Remove if you want all pods to be eligible)
18      # taints:
19      #   - key: karpenter.sh/nodepool
20      #     effect: NoSchedule
21
22      requirements:
23        # Accept both On-Demand and Spot
24        - key: karpenter.sh/capacity-type
25          operator: In
26          values: ["spot", "on-demand"]
27
28        # Instance families: general purpose + compute optimized
29        - key: node.kubernetes.io/instance-type
30          operator: In
31          values:
32            - m6i.large
33            - m6i.xlarge
34            - m6i.2xlarge
35            - m6a.large
36            - m6a.xlarge
37            - c6i.large
38            - c6i.xlarge
39            - c6i.2xlarge
40
41        # Restrict to specific AZs (match your EBS volumes)
42        - key: topology.kubernetes.io/zone
43          operator: In
44          values: ["us-east-1a", "us-east-1b", "us-east-1c"]
45
46        # Only amd64 — remove if ARM (Graviton) is acceptable
47        - key: kubernetes.io/arch
48          operator: In
49          values: ["amd64"]
50
51  # Disruption: node consolidation behavior
52  disruption:
53    consolidationPolicy: WhenEmptyOrUnderutilized    # Remove underutilized nodes
54    # consolidateAfter is only valid with WhenEmpty, not WhenEmptyOrUnderutilized
55    expireAfter: 720h        # Recycle nodes every 30 days (forces image/patch updates)
56
57  # Resource limits: cap on total CPU/memory Karpenter will provision
58  limits:
59    cpu: "1000"         # 1000 vCPUs maximum
60    memory: 4000Gi

expireAfter is a critical operational setting — it forces node replacement on a schedule, ensuring nodes get fresh AMIs and security patches without requiring manual recycling.


EC2NodeClass: AWS-Specific Configuration

yaml
1apiVersion: karpenter.k8s.aws/v1
2kind: EC2NodeClass
3metadata:
4  name: default
5spec:
6  # AMI selection: alias selects the AMI family and version — do not use amiFamily alongside alias
7  amiSelectorTerms:
8    - alias: al2023@latest    # AL2023 EKS-optimized AMI, latest patch for the cluster version
9
10  # Subnet selection: Karpenter provisions nodes in matching subnets
11  subnetSelectorTerms:
12    - tags:
13        karpenter.sh/discovery: production    # Tag your private subnets with this
14
15  # Security group selection: attach matching security groups to new nodes
16  securityGroupSelectorTerms:
17    - tags:
18        karpenter.sh/discovery: production
19
20  # Node IAM role: EC2 instances assume this role
21  role: KarpenterNodeRole-production
22
23  # Block device: encrypt root volume
24  blockDeviceMappings:
25    - deviceName: /dev/xvda
26      ebs:
27        volumeSize: 50Gi
28        volumeType: gp3
29        iops: 3000
30        throughput: 125
31        encrypted: true

Tag your private subnets and the cluster security group with karpenter.sh/discovery: production so Karpenter can discover them:

bash
1# Tag subnets
2aws ec2 create-tags \
3  --resources subnet-0a1b2c3d subnet-0e5f6a7b subnet-0c8d9e0f \
4  --tags Key=karpenter.sh/discovery,Value=production
5
6# Tag the cluster security group
7aws ec2 create-tags \
8  --resources sg-0a1b2c3d4e5f67890 \
9  --tags Key=karpenter.sh/discovery,Value=production

Spot Instance Handling

Karpenter handles Spot interruptions via SQS:

bash
1# Create the SQS queue for interruption events
2aws sqs create-queue --queue-name KarpenterInterruption
3
4# Subscribe the queue to EC2 Spot interruption events
5aws events put-rule \
6  --name KarpenterSpotInterruption \
7  --event-pattern '{"source":["aws.ec2"],"detail-type":["EC2 Spot Instance Interruption Warning","EC2 Instance Rebalance Recommendation","EC2 Instance State-change Notification"]}'
8
9aws events put-targets \
10  --rule KarpenterSpotInterruption \
11  --targets "Id=1,Arn=arn:aws:sqs:${AWS_REGION}:${AWS_ACCOUNT_ID}:KarpenterInterruption"

When Karpenter receives a 2-minute Spot interruption warning, it immediately begins draining the node (cordoning it and evicting pods with respect to PodDisruptionBudgets) and provisions a replacement node in parallel. The result: Spot interruptions cause no service disruption for workloads with minAvailable ≥ 2 in their PDB.

For stateful workloads that shouldn't run on Spot:

yaml
# Force on-demand for database nodes
spec:
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values: ["on-demand"]    # Only on-demand for this NodePool

Migrating from Cluster Autoscaler

Running both simultaneously during migration is safe — they don't conflict:

bash
1# 1. Install Karpenter alongside Cluster Autoscaler
2# 2. Create NodePool/EC2NodeClass
3# 3. Label new pods to use Karpenter nodes (optional — or let it fill naturally)
4
5# 4. When ready, scale down Cluster Autoscaler
6kubectl scale deployment cluster-autoscaler \
7  --replicas=0 \
8  -n kube-system
9
10# 5. Remove node group labels/annotations that prevent Karpenter from managing them
11# (If your ASG nodes have: cluster-autoscaler.kubernetes.io/safe-to-evict: "false")

Migration gotchas:

  • Karpenter does not manage nodes in existing Auto Scaling Groups — it creates standalone EC2 instances. You still need to scale ASG node groups down manually after migration.
  • Pod annotations cluster-autoscaler.kubernetes.io/safe-to-evict: "false" block Karpenter consolidation too — review and remove where possible.
  • Karpenter respects PodDisruptionBudgets during consolidation and node recycling.

Frequently Asked Questions

How does Karpenter pick instance types?

Karpenter evaluates the pending pod's resource requests, scheduling requirements (node selector, affinity, tolerations), and the allowed instance types in the NodePool. It prices each viable option (Spot price + On-Demand fallback) and selects the least expensive instance that fits. Providing a broad list of instance families (not just one type) gives Karpenter the flexibility to find available Spot capacity — specifying only m5.xlarge means it can't fall back to m5a.xlarge or m6i.xlarge when m5.xlarge Spot is unavailable.

Should I run one NodePool or multiple?

One default NodePool covers most workloads. Add additional NodePools for: GPU workloads (separate instance family requirements, different taints), Spot vs On-Demand segregation (database nodes must be On-Demand), ARM/Graviton nodes (separate arch requirements). Keep the number of NodePools small — each one adds management overhead and can fragment bin-packing.

What's the consolidation difference between Karpenter 1.0 and earlier versions?

Karpenter 1.0 (released November 2024) replaced the Provisioner CRD with NodePool and EC2NodeClass. The consolidationPolicy: WhenEmptyOrUnderutilized in NodePool replaces the old consolidation.enabled: true on the Provisioner. The behavior is similar, but 1.0 adds disruption.budgets to cap how many nodes can be disrupted simultaneously — important for large clusters where aggressive consolidation can cause scheduling churn.


For HPA and VPA that scale pods (which then trigger Karpenter to scale nodes), see Kubernetes HPA and VPA: Horizontal and Vertical Pod Autoscaling. For cost optimization patterns that use Karpenter's Spot support, see Kubernetes Cost Optimization and FinOps.

Migrating from Cluster Autoscaler to Karpenter or tuning Karpenter for a production EKS cluster? Talk to us at Coding Protocols — we help platform teams configure Karpenter NodePools that balance cost, availability, and scheduling efficiency.

Related Topics

Karpenter
Cluster Autoscaler
Kubernetes
EKS
Node Autoscaling
Platform Engineering
Cost Optimization
AWS

Read Next