EKS Cluster Upgrades: Zero-Downtime Strategy for Production
Upgrading an EKS cluster in production without downtime requires a specific order of operations: control plane first, then addons, then nodes — with validation at each step. Skipping versions, upgrading nodes before addons, or draining too aggressively during business hours are the most common causes of upgrade incidents. This is the process that avoids them.

EKS supports in-place control plane upgrades: the API server, etcd, and control plane components are upgraded by AWS without a cluster rebuild. Worker nodes are replaced — old AMIs don't run new Kubernetes versions — so the upgrade process involves a rolling node replacement that must be orchestrated to avoid disrupting running workloads.
The two most common upgrade mistakes: upgrading more than one minor version at a time (not supported), and upgrading without checking addon compatibility (CoreDNS, kube-proxy, VPC CNI all have version requirements tied to the control plane version).
EKS Version Support Timeline
EKS supports approximately 4-5 Kubernetes versions at any time. Each version is supported for approximately 14 months after its EKS release. AWS announces end-of-support dates well in advance.
Key policy: EKS auto-upgrades clusters past their end-of-support date — you have no choice. Planning your upgrades on a regular cadence (one minor version every 3-4 months) avoids the forced upgrade scramble.
Extended support: After the standard support window closes (~14 months), EKS offers extended support for an additional 12 months. Clusters running on extended support versions incur an additional charge of $0.60/cluster/hour on top of the standard control plane fee. Plan upgrades before the standard support window ends to avoid this cost.
1# Check available EKS versions
2aws eks describe-addon-versions --kubernetes-version 1.32 \
3 --query 'addons[].addonName' --output text
4
5# Check current cluster version
6aws eks describe-cluster --name my-cluster \
7 --query 'cluster.version' --output text
8
9# Check addon versions for current cluster
10aws eks describe-addon --cluster-name my-cluster --addon-name aws-vpc-cni \
11 --query 'addon.{version: addonVersion, status: status}'Pre-Upgrade Checklist
Run this before any production upgrade:
1# 1. Check addon compatibility matrix for the target version
2# AWS publishes this at: docs.aws.amazon.com/eks/latest/userguide/managing-add-ons.html
3# Or query it:
4aws eks describe-addon-versions \
5 --kubernetes-version 1.33 \
6 --query 'addons[*].{name: addonName, versions: addonVersions[0].addonVersion}'
7
8# 2. Check deprecated API usage (resources using APIs removed in target version)
9# Use pluto (from FairWinds):
10pluto detect-helm --target-versions k8s=v1.33
11pluto detect-files -d ./k8s-manifests --target-versions k8s=v1.33
12
13# Or with kubectl-convert for specific manifests
14kubectl convert -f deployment.yaml --output-version apps/v1
15
16# 3. Verify PodDisruptionBudgets are in place for all critical services
17kubectl get pdb -A
18
19# 4. Check that all nodes are healthy
20kubectl get nodes
21
22# 5. Back up etcd (EKS manages etcd, but Velero snapshot is still recommended)
23velero backup create pre-upgrade-$(date +%Y%m%d) --include-namespaces production
24
25# 6. Review recent changes — freeze non-critical deployments for 24h before upgrade
26git log --since="24 hours ago" origin/mainUpgrade Order
The required order is non-negotiable:
1. Control plane (API server, etcd) ← AWS upgrades, takes ~15 min
2. EKS managed addons ← update to versions compatible with new k8s
3. Worker nodes ← replace old AMI with new AMI
4. Self-managed addons (Karpenter, etc.) ← update after nodes are on new version
Upgrading nodes before addons can leave CoreDNS or kube-proxy at a version incompatible with the new node kubelet.
Step 1: Control Plane Upgrade
1# AWS Console: EKS → Cluster → Update Kubernetes version
2# CLI:
3aws eks update-cluster-version \
4 --name my-cluster \
5 --kubernetes-version 1.33
6
7# Monitor upgrade progress (takes 10-20 minutes)
8aws eks describe-cluster --name my-cluster \
9 --query 'cluster.{version: version, status: status}'
10
11# Wait for ACTIVE status
12aws eks wait cluster-active --name my-clusterDuring the control plane upgrade, the API server is briefly unavailable (typically <30s). Worker nodes continue running — existing pods are unaffected because kubelet on nodes communicates with the API server only for reconciliation, not for serving requests.
Step 2: Addon Upgrades
1# Update vpc-cni to the latest version compatible with target K8s version
2LATEST_VPC_CNI=$(aws eks describe-addon-versions \
3 --kubernetes-version 1.33 \
4 --addon-name aws-vpc-cni \
5 --query 'addons[0].addonVersions[0].addonVersion' \
6 --output text)
7
8aws eks update-addon \
9 --cluster-name my-cluster \
10 --addon-name aws-vpc-cni \
11 --addon-version $LATEST_VPC_CNI \
12 --resolve-conflicts OVERWRITE
13
14# Repeat for coredns and kube-proxy
15LATEST_COREDNS=$(aws eks describe-addon-versions \
16 --kubernetes-version 1.33 \
17 --addon-name coredns \
18 --query 'addons[0].addonVersions[0].addonVersion' \
19 --output text)
20
21aws eks update-addon \
22 --cluster-name my-cluster \
23 --addon-name coredns \
24 --addon-version $LATEST_COREDNS
25
26LATEST_PROXY=$(aws eks describe-addon-versions \
27 --kubernetes-version 1.33 \
28 --addon-name kube-proxy \
29 --query 'addons[0].addonVersions[0].addonVersion' \
30 --output text)
31
32aws eks update-addon \
33 --cluster-name my-cluster \
34 --addon-name kube-proxy \
35 --addon-version $LATEST_PROXY
36
37# Wait for all addons to be ACTIVE
38for addon in vpc-cni coredns kube-proxy; do
39 aws eks wait addon-active --cluster-name my-cluster --addon-name $addon
40done--resolve-conflicts: OVERWRITE vs PRESERVE
OVERWRITE replaces any custom configuration on the addon with AWS defaults. PRESERVE keeps your user-managed field values and completes the update — it does not fail on conflicts, it simply keeps the divergent values. NONE returns an error if there are any conflicts, which is useful for detecting configuration drift. For addons with custom configuration (custom Corefile, VPC CNI env vars), use PRESERVE so your settings are retained.
Step 3: Node Upgrades
Managed Node Groups
1# Update managed node group — triggers rolling replacement
2aws eks update-nodegroup-version \
3 --cluster-name my-cluster \
4 --nodegroup-name default \
5 --kubernetes-version 1.33
6
7# Or specify a specific AMI release:
8aws eks update-nodegroup-version \
9 --cluster-name my-cluster \
10 --nodegroup-name default \
11 --release-version "1.33.x-eksbuild.y" # Get from EKS release notes
12
13# Watch the update
14aws eks describe-nodegroup \
15 --cluster-name my-cluster \
16 --nodegroup-name default \
17 --query 'nodegroup.{status: status, health: health}'The managed node group rolling update:
- Cordons and drains one node at a time (respecting PodDisruptionBudgets)
- Terminates the old node
- Launches a new node with the new AMI
- Waits for the new node to become Ready before draining the next
With maxUnavailable: 1 (default), this is a serial replacement. For faster upgrades, increase maxUnavailable (trades safety for speed):
# Change update config for faster upgrades (use with caution)
aws eks update-nodegroup-config \
--cluster-name my-cluster \
--nodegroup-name default \
--update-config '{"maxUnavailable": 2}'Karpenter Nodes
Karpenter nodes are not upgraded in-place — they're replaced. Force replacement by cordoning the old nodes and letting Karpenter provision new ones:
1# Get all Karpenter-managed nodes
2kubectl get nodes -l karpenter.sh/nodepool -o name
3
4# Cordon old nodes (prevents new pods from scheduling)
5kubectl get nodes -l karpenter.sh/nodepool -o name | \
6 xargs kubectl cordon
7
8# Remove any do-not-disrupt protection so Karpenter can drain and replace these nodes
9kubectl get nodeclaims -o name | \
10 xargs -I{} kubectl annotate {} karpenter.sh/do-not-disrupt- --ignore-not-found
11
12# Alternatively: delete NodeClaims directly (Karpenter reprovisiones)
13kubectl get nodeclaims -o name | xargs kubectl deleteOr use Karpenter's drift feature — set the AMI family to AL2023 and Karpenter automatically replaces nodes when a new AMI is available for the target K8s version:
1spec:
2 template:
3 spec:
4 nodeClassRef:
5 name: default
6 # Karpenter uses the latest AL2023 AMI for the target K8s version
7 # When the cluster version changes, nodes are marked drifted and replacedTerraform-Managed EKS Upgrades
For clusters managed with Terraform using the terraform-aws-eks module:
1# Change cluster_version to the target version
2# terraform plan shows what will change
3module "eks" {
4 source = "terraform-aws-modules/eks/aws"
5 version = "~> 20.0" # terraform-aws-modules/eks is in the v20.x series as of 2025
6
7 cluster_version = "1.33" # Was "1.32"
8
9 # Node group AMI release version (optional — leave unset for latest)
10 eks_managed_node_groups = {
11 default = {
12 min_size = 3
13 max_size = 10
14 desired_size = 3
15
16 # Force replacement to use new AMI
17 # Changing ami_release_version triggers node group update
18 ami_type = "AL2023_x86_64_STANDARD"
19 # ami_release_version = "1.33.x-eksbuild.y"
20 }
21 }
22}1# Control plane upgrade
2terraform apply -target module.eks.aws_eks_cluster.this
3
4# Verify control plane, then upgrade addons
5terraform apply -target module.eks.aws_eks_addon.this
6
7# Finally, upgrade node groups
8terraform applyPost-Upgrade Validation
1# All nodes running new version
2kubectl get nodes -o wide
3
4# All pods healthy
5kubectl get pods -A | grep -v Running | grep -v Completed
6
7# CoreDNS resolving correctly
8kubectl run dns-test --rm -it --image=busybox -- nslookup kubernetes.default
9
10# API server accessible and version matches
11kubectl version
12
13# Check for any deprecation warnings in kubectl output
14kubectl get deployments -A
15
16# Monitor for 30 minutes — check error rates in Grafana
17# Watch for: OOMKilled pods (memory changes between K8s versions), CrashLoopBackOff,
18# service connectivity issues (kube-proxy version mismatch)Frequently Asked Questions
Can I skip minor versions (e.g., 1.29 → 1.32)?
No. EKS requires upgrading one minor version at a time. 1.29 → 1.30 → 1.31 → 1.32. Attempting to skip versions returns an error. Plan ahead — if you're on 1.29 and want to get to 1.32, that's three separate upgrade cycles, each taking a few hours.
How do I handle workloads using deprecated API versions?
Use pluto before upgrading to identify resources using deprecated APIs. Update the manifests in Git before the upgrade — you have a window between when an API is deprecated (a warning is logged) and when it's removed (the API returns 404). Commonly removed APIs between versions are documented in the official Kubernetes changelog.
For generic Kubernetes upgrade strategy that applies across EKS, GKE, and AKS — deprecated API detection, PDB requirements, blue-green node replacement, and rollback planning — see Kubernetes Cluster Upgrades: Zero-Downtime Strategy for Production. For Terraform-based EKS cluster management where upgrades are applied as code changes, see Terraform for EKS Infrastructure. For PodDisruptionBudgets that protect workloads during node drain operations in upgrades, see Kubernetes PodDisruptionBudget and Graceful Shutdown Patterns. For a side-by-side comparison of upgrade strategies across EKS, GKE, and AKS, see Managed Kubernetes Comparison: EKS vs GKE vs AKS.
Running EKS upgrades across multiple production clusters? Talk to us at Coding Protocols — we help platform teams build upgrade runbooks and automation that keep clusters on supported versions without unplanned downtime.


