EKS Cluster Upgrades: Zero-Downtime Strategy for Production (2026)

EKS supports in-place control plane upgrades: the API server, etcd, and control plane components are upgraded by AWS without a cluster rebuild. Worker nodes are replaced — old AMIs don't run new Kubernetes versions — so the upgrade process involves a rolling node replacement that must be orchestrated to avoid disrupting running workloads.

The two most common upgrade mistakes: upgrading more than one minor version at a time (not supported), and upgrading without checking addon compatibility (CoreDNS, kube-proxy, VPC CNI all have version requirements tied to the control plane version).

EKS Version Support Timeline

EKS supports approximately 4-5 Kubernetes versions at any time. Each version is supported for approximately 14 months after its EKS release. AWS announces end-of-support dates well in advance.

Key policy: EKS auto-upgrades clusters past their end-of-support date — you have no choice. Planning your upgrades on a regular cadence (one minor version every 3-4 months) avoids the forced upgrade scramble.

Extended support: After the standard support window closes (~14 months), EKS offers extended support for an additional 12 months. Clusters running on extended support versions incur an additional charge of $0.60/cluster/hour on top of the standard control plane fee. Plan upgrades before the standard support window ends to avoid this cost.

bash

1# Check available EKS versions
2aws eks describe-addon-versions --kubernetes-version 1.32 \
3  --query 'addons[].addonName' --output text
4
5# Check current cluster version
6aws eks describe-cluster --name my-cluster \
7  --query 'cluster.version' --output text
8
9# Check addon versions for current cluster
10aws eks describe-addon --cluster-name my-cluster --addon-name aws-vpc-cni \
11  --query 'addon.{version: addonVersion, status: status}'

Pre-Upgrade Checklist

Run this before any production upgrade:

bash

1# 1. Check addon compatibility matrix for the target version
2# AWS publishes this at: docs.aws.amazon.com/eks/latest/userguide/managing-add-ons.html
3# Or query it:
4aws eks describe-addon-versions \
5  --kubernetes-version 1.33 \
6  --query 'addons[*].{name: addonName, versions: addonVersions[0].addonVersion}'
7
8# 2. Check deprecated API usage (resources using APIs removed in target version)
9# Use pluto (from FairWinds):
10pluto detect-helm --target-versions k8s=v1.33
11pluto detect-files -d ./k8s-manifests --target-versions k8s=v1.33
12
13# Or with kubectl-convert for specific manifests
14kubectl convert -f deployment.yaml --output-version apps/v1
15
16# 3. Verify PodDisruptionBudgets are in place for all critical services
17kubectl get pdb -A
18
19# 4. Check that all nodes are healthy
20kubectl get nodes
21
22# 5. Back up etcd (EKS manages etcd, but Velero snapshot is still recommended)
23velero backup create pre-upgrade-$(date +%Y%m%d) --include-namespaces production
24
25# 6. Review recent changes — freeze non-critical deployments for 24h before upgrade
26git log --since="24 hours ago" origin/main

Upgrade Order

The required order is non-negotiable:

1. Control plane (API server, etcd)           ← AWS upgrades, takes ~15 min
2. EKS managed addons                         ← update to versions compatible with new k8s
3. Worker nodes                               ← replace old AMI with new AMI
4. Self-managed addons (Karpenter, etc.)      ← update after nodes are on new version

Upgrading nodes before addons can leave CoreDNS or kube-proxy at a version incompatible with the new node kubelet.

Step 1: Control Plane Upgrade

bash

1# AWS Console: EKS → Cluster → Update Kubernetes version
2# CLI:
3aws eks update-cluster-version \
4  --name my-cluster \
5  --kubernetes-version 1.33
6
7# Monitor upgrade progress (takes 10-20 minutes)
8aws eks describe-cluster --name my-cluster \
9  --query 'cluster.{version: version, status: status}'
10
11# Wait for ACTIVE status
12aws eks wait cluster-active --name my-cluster

During the control plane upgrade, the API server is briefly unavailable (typically <30s). Worker nodes continue running — existing pods are unaffected because kubelet on nodes communicates with the API server only for reconciliation, not for serving requests.

Step 2: Addon Upgrades

bash

1# Update vpc-cni to the latest version compatible with target K8s version
2LATEST_VPC_CNI=$(aws eks describe-addon-versions \
3  --kubernetes-version 1.33 \
4  --addon-name aws-vpc-cni \
5  --query 'addons[0].addonVersions[0].addonVersion' \
6  --output text)
7
8aws eks update-addon \
9  --cluster-name my-cluster \
10  --addon-name aws-vpc-cni \
11  --addon-version $LATEST_VPC_CNI \
12  --resolve-conflicts OVERWRITE
13
14# Repeat for coredns and kube-proxy
15LATEST_COREDNS=$(aws eks describe-addon-versions \
16  --kubernetes-version 1.33 \
17  --addon-name coredns \
18  --query 'addons[0].addonVersions[0].addonVersion' \
19  --output text)
20
21aws eks update-addon \
22  --cluster-name my-cluster \
23  --addon-name coredns \
24  --addon-version $LATEST_COREDNS
25
26LATEST_PROXY=$(aws eks describe-addon-versions \
27  --kubernetes-version 1.33 \
28  --addon-name kube-proxy \
29  --query 'addons[0].addonVersions[0].addonVersion' \
30  --output text)
31
32aws eks update-addon \
33  --cluster-name my-cluster \
34  --addon-name kube-proxy \
35  --addon-version $LATEST_PROXY
36
37# Wait for all addons to be ACTIVE
38for addon in vpc-cni coredns kube-proxy; do
39  aws eks wait addon-active --cluster-name my-cluster --addon-name $addon
40done

--resolve-conflicts: OVERWRITE vs PRESERVE

OVERWRITE replaces any custom configuration on the addon with AWS defaults. PRESERVE keeps your user-managed field values and completes the update — it does not fail on conflicts, it simply keeps the divergent values. NONE returns an error if there are any conflicts, which is useful for detecting configuration drift. For addons with custom configuration (custom Corefile, VPC CNI env vars), use PRESERVE so your settings are retained.

Step 3: Node Upgrades

Managed Node Groups

bash

1# Update managed node group — triggers rolling replacement
2aws eks update-nodegroup-version \
3  --cluster-name my-cluster \
4  --nodegroup-name default \
5  --kubernetes-version 1.33
6
7# Or specify a specific AMI release:
8aws eks update-nodegroup-version \
9  --cluster-name my-cluster \
10  --nodegroup-name default \
11  --release-version "1.33.x-eksbuild.y"    # Get from EKS release notes
12
13# Watch the update
14aws eks describe-nodegroup \
15  --cluster-name my-cluster \
16  --nodegroup-name default \
17  --query 'nodegroup.{status: status, health: health}'

The managed node group rolling update:

Cordons and drains one node at a time (respecting PodDisruptionBudgets)
Terminates the old node
Launches a new node with the new AMI
Waits for the new node to become Ready before draining the next

With maxUnavailable: 1 (default), this is a serial replacement. For faster upgrades, increase maxUnavailable (trades safety for speed):

bash

# Change update config for faster upgrades (use with caution)
aws eks update-nodegroup-config \
  --cluster-name my-cluster \
  --nodegroup-name default \
  --update-config '{"maxUnavailable": 2}'

Karpenter Nodes

Karpenter nodes are not upgraded in-place — they're replaced. Force replacement by cordoning the old nodes and letting Karpenter provision new ones:

bash

1# Get all Karpenter-managed nodes
2kubectl get nodes -l karpenter.sh/nodepool -o name
3
4# Cordon old nodes (prevents new pods from scheduling)
5kubectl get nodes -l karpenter.sh/nodepool -o name | \
6  xargs kubectl cordon
7
8# Remove any do-not-disrupt protection so Karpenter can drain and replace these nodes
9kubectl get nodeclaims -o name | \
10  xargs -I{} kubectl annotate {} karpenter.sh/do-not-disrupt- --ignore-not-found
11
12# Alternatively: delete NodeClaims directly (Karpenter reprovisiones)
13kubectl get nodeclaims -o name | xargs kubectl delete

Or use Karpenter's drift feature — set the AMI family to AL2023 and Karpenter automatically replaces nodes when a new AMI is available for the target K8s version:

yaml

1spec:
2  template:
3    spec:
4      nodeClassRef:
5        name: default
6      # Karpenter uses the latest AL2023 AMI for the target K8s version
7      # When the cluster version changes, nodes are marked drifted and replaced

Terraform-Managed EKS Upgrades

For clusters managed with Terraform using the terraform-aws-eks module:

hcl

1# Change cluster_version to the target version
2# terraform plan shows what will change
3module "eks" {
4  source  = "terraform-aws-modules/eks/aws"
5  version = "~> 20.0"    # terraform-aws-modules/eks is in the v20.x series as of 2025
6
7  cluster_version = "1.33"    # Was "1.32"
8
9  # Node group AMI release version (optional — leave unset for latest)
10  eks_managed_node_groups = {
11    default = {
12      min_size     = 3
13      max_size     = 10
14      desired_size = 3
15
16      # Force replacement to use new AMI
17      # Changing ami_release_version triggers node group update
18      ami_type = "AL2023_x86_64_STANDARD"
19      # ami_release_version = "1.33.x-eksbuild.y"
20    }
21  }
22}

bash

1# Control plane upgrade
2terraform apply -target module.eks.aws_eks_cluster.this
3
4# Verify control plane, then upgrade addons
5terraform apply -target module.eks.aws_eks_addon.this
6
7# Finally, upgrade node groups
8terraform apply

Post-Upgrade Validation

bash

1# All nodes running new version
2kubectl get nodes -o wide
3
4# All pods healthy
5kubectl get pods -A | grep -v Running | grep -v Completed
6
7# CoreDNS resolving correctly
8kubectl run dns-test --rm -it --image=busybox -- nslookup kubernetes.default
9
10# API server accessible and version matches
11kubectl version
12
13# Check for any deprecation warnings in kubectl output
14kubectl get deployments -A
15
16# Monitor for 30 minutes — check error rates in Grafana
17# Watch for: OOMKilled pods (memory changes between K8s versions), CrashLoopBackOff,
18# service connectivity issues (kube-proxy version mismatch)

Frequently Asked Questions

Can I skip minor versions (e.g., 1.29 → 1.32)?

No. EKS requires upgrading one minor version at a time. 1.29 → 1.30 → 1.31 → 1.32. Attempting to skip versions returns an error. Plan ahead — if you're on 1.29 and want to get to 1.32, that's three separate upgrade cycles, each taking a few hours.

How do I handle workloads using deprecated API versions?

Use pluto before upgrading to identify resources using deprecated APIs. Update the manifests in Git before the upgrade — you have a window between when an API is deprecated (a warning is logged) and when it's removed (the API returns 404). Commonly removed APIs between versions are documented in the official Kubernetes changelog.

For generic Kubernetes upgrade strategy that applies across EKS, GKE, and AKS — deprecated API detection, PDB requirements, blue-green node replacement, and rollback planning — see Kubernetes Cluster Upgrades: Zero-Downtime Strategy for Production. For Terraform-based EKS cluster management where upgrades are applied as code changes, see Terraform for EKS Infrastructure. For PodDisruptionBudgets that protect workloads during node drain operations in upgrades, see Kubernetes PodDisruptionBudget and Graceful Shutdown Patterns. For a side-by-side comparison of upgrade strategies across EKS, GKE, and AKS, see Managed Kubernetes Comparison: EKS vs GKE vs AKS.

Running EKS upgrades across multiple production clusters? Talk to us at Coding Protocols — we help platform teams build upgrade runbooks and automation that keep clusters on supported versions without unplanned downtime.

EKS Cluster Upgrades: Zero-Downtime Strategy for Production

EKS Version Support Timeline

Pre-Upgrade Checklist

Upgrade Order

Step 1: Control Plane Upgrade

Step 2: Addon Upgrades

--resolve-conflicts: OVERWRITE vs PRESERVE

Step 3: Node Upgrades

Managed Node Groups

Karpenter Nodes

Terraform-Managed EKS Upgrades

Post-Upgrade Validation

Frequently Asked Questions

Can I skip minor versions (e.g., 1.29 → 1.32)?

How do I handle workloads using deprecated API versions?

Related Topics

Read Next

Kubernetes Cluster Upgrades: Zero-Downtime Strategy for Production

Karpenter v1: Node Provisioning, Consolidation, and Drift

Kubernetes Debugging: Systematic Troubleshooting for Production Incidents