Setting Up Cluster Autoscaler on EKS
Configure Kubernetes Cluster Autoscaler on EKS so nodes scale automatically with your workloads. Covers IAM setup, autoscaling group tagging, deployment, and tuning scale-down behavior.
Before you begin
- An EKS cluster with managed node groups
- kubectl configured for the cluster
- AWS CLI configured with sufficient permissions
- Helm 3 installed
- eksctl installed (optional but helpful)
Without Cluster Autoscaler, your EKS cluster has a fixed number of nodes. When pods can't be scheduled due to insufficient capacity, they stay Pending forever. With Cluster Autoscaler, AWS adds nodes automatically when needed and removes them when they're idle.
This tutorial sets it up correctly — including the IAM permissions that most guides gloss over.
How Cluster Autoscaler Works
The Cluster Autoscaler watches for unschedulable pods and checks whether adding a node would allow them to run. If yes, it increases the desired count of the matching Auto Scaling Group. It also watches for underutilised nodes and removes them after a configurable idle period.
Step 1: Tag Your Node Group ASG
Cluster Autoscaler discovers node groups by looking for specific tags on Auto Scaling Groups:
# Get your ASG name
aws autoscaling describe-auto-scaling-groups \
--query "AutoScalingGroups[?contains(Tags[?Key=='eks:cluster-name'].Value, 'my-cluster')].AutoScalingGroupName" \
--output text
# Tag the ASG (replace values)
aws autoscaling create-or-update-tags \
--tags \
ResourceId=<asg-name>,ResourceType=auto-scaling-group,Key=k8s.io/cluster-autoscaler/enabled,Value=true,PropagateAtLaunch=false \
ResourceId=<asg-name>,ResourceType=auto-scaling-group,Key=k8s.io/cluster-autoscaler/<cluster-name>,Value=owned,PropagateAtLaunch=false
If you used eksctl or Terraform to create the cluster, these tags may already be present. Verify:
aws autoscaling describe-tags \
--filters Name=auto-scaling-group-name,Values=<asg-name> \
--query "Tags[?Key=='k8s.io/cluster-autoscaler/enabled']"
Step 2: Create the IAM Policy
Cluster Autoscaler needs permission to describe and modify Auto Scaling Groups:
cat > cluster-autoscaler-policy.json <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeScalingActivities",
"autoscaling:DescribeTags",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup",
"ec2:DescribeImages",
"ec2:DescribeInstanceTypes",
"ec2:DescribeLaunchTemplateVersions",
"ec2:GetInstanceTypesFromInstanceRequirements",
"eks:DescribeNodegroup"
],
"Resource": "*"
}
]
}
EOF
aws iam create-policy \
--policy-name ClusterAutoscalerPolicy \
--policy-document file://cluster-autoscaler-policy.json
Step 3: Create the IAM Role with IRSA
IRSA (IAM Roles for Service Accounts) lets the Cluster Autoscaler pod assume an IAM role without static credentials. This is the correct approach — never use node-level IAM permissions for this.
# Get your OIDC provider URL
OIDC_URL=$(aws eks describe-cluster \
--name my-cluster \
--query "cluster.identity.oidc.issuer" \
--output text | sed 's|https://||')
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
POLICY_ARN="arn:aws:iam::${ACCOUNT_ID}:policy/ClusterAutoscalerPolicy"
# Create the trust policy
cat > trust-policy.json <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::${ACCOUNT_ID}:oidc-provider/${OIDC_URL}"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${OIDC_URL}:sub": "system:serviceaccount:kube-system:cluster-autoscaler",
"${OIDC_URL}:aud": "sts.amazonaws.com"
}
}
}
]
}
EOF
# Create the role
aws iam create-role \
--role-name ClusterAutoscalerRole \
--assume-role-policy-document file://trust-policy.json
# Attach the policy
aws iam attach-role-policy \
--role-name ClusterAutoscalerRole \
--policy-arn $POLICY_ARN
ROLE_ARN=$(aws iam get-role \
--role-name ClusterAutoscalerRole \
--query "Role.Arn" --output text)
echo "Role ARN: $ROLE_ARN"
Or with eksctl:
eksctl create iamserviceaccount \
--cluster=my-cluster \
--namespace=kube-system \
--name=cluster-autoscaler \
--attach-policy-arn=$POLICY_ARN \
--approve
Step 4: Deploy Cluster Autoscaler via Helm
helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm repo update
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
--namespace kube-system \
--set autoDiscovery.clusterName=my-cluster \
--set awsRegion=ap-south-1 \
--set rbac.serviceAccount.create=true \
--set rbac.serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=$ROLE_ARN \
--set extraArgs.balance-similar-node-groups=true \
--set extraArgs.skip-nodes-with-system-pods=false \
--set extraArgs.scale-down-delay-after-add=2m \
--set extraArgs.scale-down-unneeded-time=5m
Key flags explained:
balance-similar-node-groups— distributes nodes evenly across AZsskip-nodes-with-system-pods=false— allows scale-down even if a node runs kube-proxy or DaemonSet podsscale-down-delay-after-add=2m— waits 2 minutes after a scale-up before evaluating scale-downscale-down-unneeded-time=5m— node must be underutilised for 5 minutes before being removed
Step 5: Annotate the ServiceAccount (if created manually)
If you didn't use eksctl for IRSA, annotate the ServiceAccount:
kubectl annotate serviceaccount cluster-autoscaler \
-n kube-system \
eks.amazonaws.com/role-arn=$ROLE_ARN
Restart the deployment to pick up the annotation:
kubectl rollout restart deployment cluster-autoscaler -n kube-system
Step 6: Verify It's Working
# Check pod is running
kubectl get pods -n kube-system -l app.kubernetes.io/name=cluster-autoscaler
# Watch logs
kubectl logs -n kube-system -l app.kubernetes.io/name=cluster-autoscaler --tail=30 -f
Trigger a scale-up by creating a deployment that requests more resources than your current nodes have:
kubectl create deployment scale-test \
--image=nginx \
--replicas=50
kubectl get pods -w
# After a few minutes, pods will go from Pending to Running as new nodes join
Watch the node count increase:
kubectl get nodes -w
Step 7: Configure Pod Disruption Budgets for Safe Scale-Down
Cluster Autoscaler respects PodDisruptionBudgets when removing nodes. Set one on your critical deployments to prevent downtime during scale-down:
kubectl apply -f - <<EOF
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-server-pdb
spec:
minAvailable: 2 # At least 2 pods must remain available
selector:
matchLabels:
app: api-server
EOF
Troubleshooting
Nodes not scaling up: Check logs for IAM permission errors. The most common issue is a missing OIDC provider or incorrect trust policy condition.
Nodes not scaling down: Check if pods have cluster-autoscaler.kubernetes.io/safe-to-evict: "false" annotation. Local storage or empty-dir volumes also block eviction.
Wrong node group being scaled: Ensure the ASG tags match exactly. The k8s.io/cluster-autoscaler/<cluster-name> tag must match your cluster name.
# Annotate pods that are safe to evict (override the default)
kubectl annotate pod <pod-name> cluster-autoscaler.kubernetes.io/safe-to-evict=true
We built Podscape to simplify Kubernetes workflows like this — logs, events, and cluster state in one interface, without switching tools.
Struggling with this in production?
We help teams fix these exact issues. Our engineers have deployed these patterns across production environments at scale.