Cloud Engineering

Setting Up Cluster Autoscaler on EKS

Intermediate40 min to complete12 min read

Configure Kubernetes Cluster Autoscaler on EKS so nodes scale automatically with your workloads. Covers IAM setup, autoscaling group tagging, deployment, and tuning scale-down behavior.

Before you begin

  • An EKS cluster with managed node groups
  • kubectl configured for the cluster
  • AWS CLI configured with sufficient permissions
  • Helm 3 installed
  • eksctl installed (optional but helpful)
AWS
EKS
Kubernetes
Autoscaling
Cluster Autoscaler
DevOps

Without Cluster Autoscaler, your EKS cluster has a fixed number of nodes. When pods can't be scheduled due to insufficient capacity, they stay Pending forever. With Cluster Autoscaler, AWS adds nodes automatically when needed and removes them when they're idle.

This tutorial sets it up correctly — including the IAM permissions that most guides gloss over.

How Cluster Autoscaler Works

The Cluster Autoscaler watches for unschedulable pods and checks whether adding a node would allow them to run. If yes, it increases the desired count of the matching Auto Scaling Group. It also watches for underutilised nodes and removes them after a configurable idle period.

Step 1: Tag Your Node Group ASG

Cluster Autoscaler discovers node groups by looking for specific tags on Auto Scaling Groups:

bash
# Get your ASG name
aws autoscaling describe-auto-scaling-groups \
  --query "AutoScalingGroups[?contains(Tags[?Key=='eks:cluster-name'].Value, 'my-cluster')].AutoScalingGroupName" \
  --output text

# Tag the ASG (replace values)
aws autoscaling create-or-update-tags \
  --tags \
    ResourceId=<asg-name>,ResourceType=auto-scaling-group,Key=k8s.io/cluster-autoscaler/enabled,Value=true,PropagateAtLaunch=false \
    ResourceId=<asg-name>,ResourceType=auto-scaling-group,Key=k8s.io/cluster-autoscaler/<cluster-name>,Value=owned,PropagateAtLaunch=false

If you used eksctl or Terraform to create the cluster, these tags may already be present. Verify:

bash
aws autoscaling describe-tags \
  --filters Name=auto-scaling-group-name,Values=<asg-name> \
  --query "Tags[?Key=='k8s.io/cluster-autoscaler/enabled']"

Step 2: Create the IAM Policy

Cluster Autoscaler needs permission to describe and modify Auto Scaling Groups:

bash
cat > cluster-autoscaler-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:DescribeAutoScalingGroups",
        "autoscaling:DescribeAutoScalingInstances",
        "autoscaling:DescribeLaunchConfigurations",
        "autoscaling:DescribeScalingActivities",
        "autoscaling:DescribeTags",
        "autoscaling:SetDesiredCapacity",
        "autoscaling:TerminateInstanceInAutoScalingGroup",
        "ec2:DescribeImages",
        "ec2:DescribeInstanceTypes",
        "ec2:DescribeLaunchTemplateVersions",
        "ec2:GetInstanceTypesFromInstanceRequirements",
        "eks:DescribeNodegroup"
      ],
      "Resource": "*"
    }
  ]
}
EOF

aws iam create-policy \
  --policy-name ClusterAutoscalerPolicy \
  --policy-document file://cluster-autoscaler-policy.json

Step 3: Create the IAM Role with IRSA

IRSA (IAM Roles for Service Accounts) lets the Cluster Autoscaler pod assume an IAM role without static credentials. This is the correct approach — never use node-level IAM permissions for this.

bash
# Get your OIDC provider URL
OIDC_URL=$(aws eks describe-cluster \
  --name my-cluster \
  --query "cluster.identity.oidc.issuer" \
  --output text | sed 's|https://||')

ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
POLICY_ARN="arn:aws:iam::${ACCOUNT_ID}:policy/ClusterAutoscalerPolicy"

# Create the trust policy
cat > trust-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::${ACCOUNT_ID}:oidc-provider/${OIDC_URL}"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "${OIDC_URL}:sub": "system:serviceaccount:kube-system:cluster-autoscaler",
          "${OIDC_URL}:aud": "sts.amazonaws.com"
        }
      }
    }
  ]
}
EOF

# Create the role
aws iam create-role \
  --role-name ClusterAutoscalerRole \
  --assume-role-policy-document file://trust-policy.json

# Attach the policy
aws iam attach-role-policy \
  --role-name ClusterAutoscalerRole \
  --policy-arn $POLICY_ARN

ROLE_ARN=$(aws iam get-role \
  --role-name ClusterAutoscalerRole \
  --query "Role.Arn" --output text)
echo "Role ARN: $ROLE_ARN"

Or with eksctl:

bash
eksctl create iamserviceaccount \
  --cluster=my-cluster \
  --namespace=kube-system \
  --name=cluster-autoscaler \
  --attach-policy-arn=$POLICY_ARN \
  --approve

Step 4: Deploy Cluster Autoscaler via Helm

bash
helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm repo update

helm install cluster-autoscaler autoscaler/cluster-autoscaler \
  --namespace kube-system \
  --set autoDiscovery.clusterName=my-cluster \
  --set awsRegion=ap-south-1 \
  --set rbac.serviceAccount.create=true \
  --set rbac.serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=$ROLE_ARN \
  --set extraArgs.balance-similar-node-groups=true \
  --set extraArgs.skip-nodes-with-system-pods=false \
  --set extraArgs.scale-down-delay-after-add=2m \
  --set extraArgs.scale-down-unneeded-time=5m

Key flags explained:

  • balance-similar-node-groups — distributes nodes evenly across AZs
  • skip-nodes-with-system-pods=false — allows scale-down even if a node runs kube-proxy or DaemonSet pods
  • scale-down-delay-after-add=2m — waits 2 minutes after a scale-up before evaluating scale-down
  • scale-down-unneeded-time=5m — node must be underutilised for 5 minutes before being removed

Step 5: Annotate the ServiceAccount (if created manually)

If you didn't use eksctl for IRSA, annotate the ServiceAccount:

bash
kubectl annotate serviceaccount cluster-autoscaler \
  -n kube-system \
  eks.amazonaws.com/role-arn=$ROLE_ARN

Restart the deployment to pick up the annotation:

bash
kubectl rollout restart deployment cluster-autoscaler -n kube-system

Step 6: Verify It's Working

bash
# Check pod is running
kubectl get pods -n kube-system -l app.kubernetes.io/name=cluster-autoscaler

# Watch logs
kubectl logs -n kube-system -l app.kubernetes.io/name=cluster-autoscaler --tail=30 -f

Trigger a scale-up by creating a deployment that requests more resources than your current nodes have:

bash
kubectl create deployment scale-test \
  --image=nginx \
  --replicas=50

kubectl get pods -w
# After a few minutes, pods will go from Pending to Running as new nodes join

Watch the node count increase:

bash
kubectl get nodes -w

Step 7: Configure Pod Disruption Budgets for Safe Scale-Down

Cluster Autoscaler respects PodDisruptionBudgets when removing nodes. Set one on your critical deployments to prevent downtime during scale-down:

bash
kubectl apply -f - <<EOF
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-server-pdb
spec:
  minAvailable: 2     # At least 2 pods must remain available
  selector:
    matchLabels:
      app: api-server
EOF

Troubleshooting

Nodes not scaling up: Check logs for IAM permission errors. The most common issue is a missing OIDC provider or incorrect trust policy condition.

Nodes not scaling down: Check if pods have cluster-autoscaler.kubernetes.io/safe-to-evict: "false" annotation. Local storage or empty-dir volumes also block eviction.

Wrong node group being scaled: Ensure the ASG tags match exactly. The k8s.io/cluster-autoscaler/<cluster-name> tag must match your cluster name.

bash
# Annotate pods that are safe to evict (override the default)
kubectl annotate pod <pod-name> cluster-autoscaler.kubernetes.io/safe-to-evict=true

We built Podscape to simplify Kubernetes workflows like this — logs, events, and cluster state in one interface, without switching tools.

Struggling with this in production?

We help teams fix these exact issues. Our engineers have deployed these patterns across production environments at scale.