The Cluster Autoscaler works but it has fundamental limitations: it scales node groups, not individual nodes, which means you're constrained to the instance types you pre-configured. Karpenter replaces this with a smarter model — it reads the actual resource requirements of pending pods and provisions exactly the right instance type for them, choosing Spot or On-Demand based on your policy, and consolidating underutilised nodes automatically.

The result: faster scale-out (60–90 seconds vs 4–5 minutes with Cluster Autoscaler), better bin-packing, and lower bills.

Step 1: Set environment variables

Export these values — they're used throughout the tutorial:

bash

1export CLUSTER_NAME=my-cluster
2export AWS_DEFAULT_REGION=us-east-1
3export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
4export KARPENTER_VERSION=1.12.1
5export KARPENTER_NAMESPACE=kube-system
6
7# Get OIDC ID
8export OIDC_ID=$(aws eks describe-cluster \
9  --name $CLUSTER_NAME \
10  --query "cluster.identity.oidc.issuer" \
11  --output text | sed 's|.*/||')
12
13echo "Account: $AWS_ACCOUNT_ID, OIDC: $OIDC_ID"

Step 2: Create the KarpenterNode IAM role

Karpenter-provisioned nodes need an IAM role with permissions to join the cluster and pull from ECR:

bash

1cat <<EOF > node-trust.json
2{
3  "Version": "2012-10-17",
4  "Statement": [{
5    "Effect": "Allow",
6    "Principal": {"Service": "ec2.amazonaws.com"},
7    "Action": "sts:AssumeRole"
8  }]
9}
10EOF
11
12aws iam create-role \
13  --role-name "KarpenterNodeRole-${CLUSTER_NAME}" \
14  --assume-role-policy-document file://node-trust.json
15
16for policy in AmazonEKSWorkerNodePolicy AmazonEKS_CNI_Policy \
17              AmazonEC2ContainerRegistryReadOnly AmazonSSMManagedInstanceCore; do
18  aws iam attach-role-policy \
19    --role-name "KarpenterNodeRole-${CLUSTER_NAME}" \
20    --policy-arn "arn:aws:iam::aws:policy/${policy}"
21done
22

Step 3: Create the KarpenterController IAM role

The Karpenter controller pod needs permissions to manage EC2 instances, query pricing, and receive Spot interruption events:

bash

1cat <<EOF > controller-policy.json
2{
3  "Version": "2012-10-17",
4  "Statement": [
5    {
6      "Sid": "Karpenter",
7      "Effect": "Allow",
8      "Action": [
9        "ec2:CreateFleet", "ec2:CreateLaunchTemplate", "ec2:CreateTags",
10        "ec2:DeleteLaunchTemplate",
11        "ec2:DescribeImages", "ec2:DescribeInstances", "ec2:DescribeInstanceStatus",
12        "ec2:DescribeInstanceTypeOfferings", "ec2:DescribeInstanceTypes",
13        "ec2:DescribeLaunchTemplates", "ec2:DescribeSecurityGroups",
14        "ec2:DescribeSpotPriceHistory", "ec2:DescribeSubnets",
15        "ec2:DescribeCapacityReservations", "ec2:DescribePlacementGroups",
16        "ec2:RunInstances", "ec2:TerminateInstances",
17        "iam:PassRole", "iam:GetInstanceProfile",
18        "iam:CreateInstanceProfile", "iam:TagInstanceProfile",
19        "iam:AddRoleToInstanceProfile", "iam:RemoveRoleFromInstanceProfile",
20        "iam:DeleteInstanceProfile", "iam:ListInstanceProfiles",
21        "pricing:GetProducts",
22        "sqs:DeleteMessage", "sqs:GetQueueUrl", "sqs:ReceiveMessage",
23        "ssm:GetParameter",
24        "eks:DescribeCluster"
25      ],
26      "Resource": "*"
27    }
28  ]
29}
30EOF
31
32aws iam create-policy \
33  --policy-name "KarpenterControllerPolicy-${CLUSTER_NAME}" \
34  --policy-document file://controller-policy.json
35
36cat <<EOF > controller-trust.json
37{
38  "Version": "2012-10-17",
39  "Statement": [{
40    "Effect": "Allow",
41    "Principal": {
42      "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/oidc.eks.${AWS_DEFAULT_REGION}.amazonaws.com/id/${OIDC_ID}"
43    },
44    "Action": "sts:AssumeRoleWithWebIdentity",
45    "Condition": {
46      "StringEquals": {
47        "oidc.eks.${AWS_DEFAULT_REGION}.amazonaws.com/id/${OIDC_ID}:sub": "system:serviceaccount:${KARPENTER_NAMESPACE}:karpenter"
48      }
49    }
50  }]
51}
52EOF
53
54aws iam create-role \
55  --role-name "KarpenterControllerRole-${CLUSTER_NAME}" \
56  --assume-role-policy-document file://controller-trust.json
57
58aws iam attach-role-policy \
59  --role-name "KarpenterControllerRole-${CLUSTER_NAME}" \
60  --policy-arn "arn:aws:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy-${CLUSTER_NAME}"

Step 4: Tag subnets and security groups for discovery

Karpenter discovers which subnets and security groups to use via tags. Tag your private subnets and the cluster node security group:

bash

1# List your cluster subnets and security group
2aws eks describe-cluster --name $CLUSTER_NAME \
3  --query "cluster.resourcesVpcConfig"
4
5# Tag each private subnet (repeat for each subnet ID)
6aws ec2 create-tags \
7  --resources subnet-XXXXXXXXXXXXXXXXX \
8  --tags Key=karpenter.sh/discovery,Value=$CLUSTER_NAME
9
10# Tag the cluster security group
11aws ec2 create-tags \
12  --resources sg-XXXXXXXXXXXXXXXXX \
13  --tags Key=karpenter.sh/discovery,Value=$CLUSTER_NAME

Step 5: Grant the Karpenter node role access to the cluster

Add the Karpenter node role to aws-auth so that nodes provisioned by Karpenter can join the cluster. Use eksctl to append the mapping safely — kubectl patch --type merge replaces the entire mapRoles key and will silently delete all existing node group mappings:

bash

1eksctl create iamidentitymapping \
2  --cluster "${CLUSTER_NAME}" \
3  --region "${AWS_DEFAULT_REGION}" \
4  --arn "arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}" \
5  --username "system:node:{{EC2PrivateDNSName}}" \
6  --group "system:bootstrappers" \
7  --group "system:nodes"

If you don't have eksctl, edit the ConfigMap manually with kubectl edit configmap aws-auth -n kube-system and append the entry under mapRoles.

Step 6: Install Karpenter with Helm

Since v0.17, Karpenter is distributed via an OCI registry — there is no helm repo add step:

bash

1helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
2  --version "${KARPENTER_VERSION}" \
3  --namespace "${KARPENTER_NAMESPACE}" \
4  --create-namespace \
5  --set settings.clusterName="${CLUSTER_NAME}" \
6  --set settings.interruptionQueue="${CLUSTER_NAME}" \
7  --set "serviceAccount.annotations.eks\.amazonaws\.com/role-arn=arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterControllerRole-${CLUSTER_NAME}" \
8  --set controller.resources.requests.cpu=1 \
9  --set controller.resources.requests.memory=1Gi \
10  --set controller.resources.limits.cpu=1 \
11  --set controller.resources.limits.memory=1Gi \
12  --wait

The serviceAccount.annotations line is for IRSA-based authentication. If your cluster uses EKS Pod Identity instead, create a Pod Identity Association for the Karpenter service account and omit this annotation.

settings.interruptionQueue requires a real SQS queue named after your cluster. If you have not created it yet, either omit this flag (Spot interruption handling will be disabled) or create the queue and EventBridge rules now — see the Karpenter getting started guide for the CloudFormation template that creates both.

Verify the controller is running:

bash

kubectl get pods -n $KARPENTER_NAMESPACE -l app.kubernetes.io/name=karpenter

Step 7: Create an EC2NodeClass

The EC2NodeClass tells Karpenter which AMI, IAM role, subnets, and security groups to use when launching nodes:

bash

1cat <<EOF | kubectl apply -f -
2apiVersion: karpenter.k8s.aws/v1
3kind: EC2NodeClass
4metadata:
5  name: default
6spec:
7  amiSelectorTerms:
8  - alias: al2023@latest
9  role: KarpenterNodeRole-${CLUSTER_NAME}
10  subnetSelectorTerms:
11  - tags:
12      karpenter.sh/discovery: ${CLUSTER_NAME}
13  securityGroupSelectorTerms:
14  - tags:
15      karpenter.sh/discovery: ${CLUSTER_NAME}
16  tags:
17    karpenter.sh/discovery: ${CLUSTER_NAME}
18EOF

Step 8: Create a NodePool

A NodePool defines instance type preferences, capacity type (Spot vs On-Demand), and consolidation policy:

bash

1cat <<EOF | kubectl apply -f -
2apiVersion: karpenter.sh/v1
3kind: NodePool
4metadata:
5  name: default
6spec:
7  template:
8    spec:
9      nodeClassRef:
10        group: karpenter.k8s.aws
11        kind: EC2NodeClass
12        name: default
13      requirements:
14      - key: karpenter.sh/capacity-type
15        operator: In
16        values: ["spot", "on-demand"]
17      - key: kubernetes.io/arch
18        operator: In
19        values: ["amd64"]
20      - key: karpenter.k8s.aws/instance-category
21        operator: In
22        values: ["c", "m", "r"]
23      - key: karpenter.k8s.aws/instance-generation
24        operator: Gt
25        values: ["2"]
26  limits:
27    cpu: 1000
28    memory: 4000Gi
29  disruption:
30    consolidationPolicy: WhenEmptyOrUnderutilized
31    consolidateAfter: 1m
32EOF

By default, Karpenter does not consolidate Spot-to-Spot node replacements. To enable it (saving cost by downsizing Spot nodes), add --set settings.featureGates.spotToSpotConsolidation=true to the Helm install command.

Step 9: Test node provisioning

Deploy a workload that exceeds your current cluster capacity and watch Karpenter provision a node:

bash

1cat <<EOF | kubectl apply -f -
2apiVersion: apps/v1
3kind: Deployment
4metadata:
5  name: karpenter-test
6spec:
7  replicas: 5
8  selector:
9    matchLabels:
10      app: karpenter-test
11  template:
12    metadata:
13      labels:
14        app: karpenter-test
15    spec:
16      containers:
17      - name: app
18        image: public.ecr.aws/amazonlinux/amazonlinux:2023
19        command: ["sleep", "3600"]
20        resources:
21          requests:
22            cpu: "1"
23            memory: "1Gi"
24EOF

Watch new nodes join the cluster — typically 60–90 seconds on first launch while the EC2 instance boots and runs the bootstrap script:

bash

kubectl get nodes --watch

Check Karpenter's logs to see its decision-making:

bash

kubectl logs -l app.kubernetes.io/name=karpenter \
  -n $KARPENTER_NAMESPACE --follow

Step 10: Clean up the test

bash

kubectl delete deployment karpenter-test

Karpenter will consolidate and terminate the nodes it provisioned within ~1 minute (per the consolidateAfter: 1m setting).

What you built

Karpenter is watching your cluster for unschedulable pods. When one appears, it selects the optimal instance type for the pending workload, launches it directly via the EC2 Fleet API, and has it ready within 60–90 seconds on a typical first launch. When pods are removed, Karpenter consolidates underutilised nodes and terminates them — cutting your EC2 bill proportionally. Spot preference in the NodePool means most nodes will run at 60–80% discount when Spot is available.

Install Karpenter on EKS for Automatic Node Provisioning

Before you begin

Step 1: Set environment variables

Step 2: Create the KarpenterNode IAM role

Step 3: Create the KarpenterController IAM role

Step 4: Tag subnets and security groups for discovery

Step 5: Grant the Karpenter node role access to the cluster

Step 6: Install Karpenter with Helm

Step 7: Create an EC2NodeClass

Step 8: Create a NodePool

Step 9: Test node provisioning

Step 10: Clean up the test

What you built

Struggling with this in production?