Kubernetes Cost Optimization and FinOps Patterns for EKS (2026)

Most Kubernetes clusters run at 20-40% average CPU utilization. The gap between requested resources and actual usage drives the cost: a pod with requests.cpu: 500m that uses 100m on average consumes 5x its actual cost in cluster capacity. At 50 services and 3 environments, this adds up to a cluster that's 3-5x larger than it needs to be.

Cost optimization in Kubernetes isn't one change — it's a system of reinforcing practices: accurate resource requests, efficient scheduling, spot instances where possible, and visibility into who's spending what.

The Cost Hierarchy

Understanding where Kubernetes costs come from helps prioritize what to fix:

1. Node costs (EC2 / compute)     → 70-85% of cluster cost
   ├── Overprovisioned resource requests → nodes larger than needed
   ├── Low utilization             → too many nodes for actual workload
   └── Wrong instance types        → on-demand for spiky/batch workloads

2. Storage costs (EBS, EFS)       → 5-15%
   ├── Orphaned PVCs               → data from deleted workloads
   └── Oversized volumes           → 100GB volumes for 5GB workloads

3. Data transfer costs            → 5-10%
   ├── Cross-AZ traffic            → pods in different AZs communicating
   └── NAT Gateway egress          → cluster traffic leaving the VPC

Fix node utilization first — it has the highest leverage.

VPA for Right-Sizing Resource Requests

Vertical Pod Autoscaler in Off mode (recommendations-only) gives you accurate resource request recommendations without automatic restarts:

yaml

1apiVersion: autoscaling.k8s.io/v1
2kind: VerticalPodAutoscaler
3metadata:
4  name: payments-api
5  namespace: production
6spec:
7  targetRef:
8    apiVersion: apps/v1
9    kind: Deployment
10    name: payments-api
11  updatePolicy:
12    updateMode: "Off"    # Recommendations only — no automatic restarts
13  resourcePolicy:
14    containerPolicies:
15      - containerName: api
16        minAllowed:
17          cpu: 50m
18          memory: 64Mi
19        maxAllowed:
20          cpu: "4"
21          memory: 8Gi
22        controlledResources: ["cpu", "memory"]

bash

1# Check VPA recommendations
2kubectl get vpa payments-api -n production -o jsonpath='{.status.recommendation}'
3# {"containerRecommendations":[{
4#   "containerName":"api",
5#   "lowerBound":{"cpu":"51m","memory":"240Mi"},
6#   "target":{"cpu":"103m","memory":"320Mi"},      ← Set requests to this
7#   "upperBound":{"cpu":"650m","memory":"1200Mi"}
8# }]}

Apply target values as the pod's resource requests. After applying right-sizing across a cluster, typical reduction is 40-60% of total requested CPU.

Goldilocks automates VPA recommendation collection across namespaces with a web UI:

bash

helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install goldilocks fairwinds-stable/goldilocks --namespace goldilocks --create-namespace

# Enable VPA recommendations for a namespace
kubectl label namespace production goldilocks.fairwinds.com/enabled=true

Karpenter for Node Efficiency

Karpenter provisions nodes on demand and bins-packs pods efficiently. The key cost configuration is the NodePool disruption policy:

yaml

1apiVersion: karpenter.sh/v1
2kind: NodePool
3metadata:
4  name: default
5spec:
6  template:
7    spec:
8      requirements:
9        - key: karpenter.sh/capacity-type
10          operator: In
11          values: ["spot", "on-demand"]    # Prefer spot, fall back to on-demand
12        - key: kubernetes.io/arch
13          operator: In
14          values: ["amd64", "arm64"]       # Graviton is ~20% cheaper per vCPU
15        - key: karpenter.k8s.aws/instance-category
16          operator: In
17          values: ["c", "m", "r"]         # Compute, memory, and memory-optimized families
18        - key: karpenter.k8s.aws/instance-generation
19          operator: Gt
20          values: ["2"]
21
22  disruption:
23    consolidationPolicy: WhenEmptyOrUnderutilized
24    # consolidateAfter is only valid with WhenEmpty, not WhenEmptyOrUnderutilized
25    # Budgets limit how aggressively Karpenter consolidates (protects against too-rapid disruption)
26    budgets:
27      - nodes: "20%"    # Never consolidate more than 20% of nodes at once
28      - schedule: "0 9-17 * * MON-FRI"
29        duration: 8h
30        nodes: "10%"    # More conservative during business hours
31
32  limits:
33    cpu: "200"          # Maximum total CPU across all Karpenter-managed nodes
34    memory: "400Gi"

Spot Instance Interruption Handling

Karpenter handles EC2 spot interruption notices automatically — it drains the node and reschedules pods within the 2-minute window AWS provides. Ensure your pods have PodDisruptionBudget so Karpenter respects availability guarantees during spot interruption drains.

Right-Sizing Node Types

Karpenter's NodePool with multiple instance types lets it choose the most cost-efficient instance for the pending pod:

yaml

1requirements:
2  - key: karpenter.k8s.aws/instance-size
3    operator: NotIn
4    values: ["nano", "micro", "small"]    # Avoid tiny instances (high overhead ratio)
5  - key: karpenter.k8s.aws/instance-family
6    operator: In
7    values: ["m7g", "m7i", "c7g", "c7i", "r7g", "r7i"]    # Current-gen only

Graviton (m7g, c7g) instances provide ~20% better price-performance. If your containers run on arm64, this is a straightforward win.

Spot Interruption Best Practices

For workloads targeting spot nodes, prepare for interruption:

yaml

1# PodDisruptionBudget — Karpenter respects these during consolidation/interruption drains
2apiVersion: policy/v1
3kind: PodDisruptionBudget
4metadata:
5  name: worker-pdb
6spec:
7  minAvailable: 1
8  selector:
9    matchLabels:
10      app: batch-worker
11
12# Toleration to run on spot-tainted nodes
13tolerations:
14  - key: "karpenter.sh/capacity-type"
15    operator: "Equal"
16    value: "spot"
17    effect: "NoSchedule"
18
19# Node affinity to prefer spot
20affinity:
21  nodeAffinity:
22    preferredDuringSchedulingIgnoredDuringExecution:
23      - weight: 100
24        preference:
25          matchExpressions:
26            - key: karpenter.sh/capacity-type
27              operator: In
28              values: ["spot"]

Kubecost for Cost Visibility

Kubecost allocates cluster costs to namespaces, labels, and teams:

bash

helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --create-namespace
# No token required for Kubecost community edition (v2.x+)
# Commercial tiers: add --set kubecostToken="<token-from-kubecost.com>"

Key reports:

Allocation by namespace: Cost per team/environment
Assets: Node, storage, and network costs
Savings: Orphaned resources, cluster idle, right-sizing recommendations

Integrate with Slack or email for weekly cost reports:

yaml

1# Kubecost alerts
2alerts:
3  - type: budget
4    threshold: 500    # Alert if monthly cost exceeds $500
5    window: "7d"
6    aggregation: namespace
7    filter: "production"
8    slackWebhookUrl: https://hooks.slack.com/services/xxx

Scheduled Scaling

Development and staging environments running 24/7 waste 60-70% of their cost. Scale to zero overnight and on weekends with a CronJob:

yaml

1# CronJob that scales down non-production namespaces at 7 PM
2apiVersion: batch/v1
3kind: CronJob
4metadata:
5  name: scale-down-dev
6  namespace: platform
7spec:
8  schedule: "0 19 * * 1-5"    # 7 PM weekdays
9  jobTemplate:
10    spec:
11      template:
12        spec:
13          serviceAccountName: namespace-scaler    # Needs patch permission on Deployments
14          restartPolicy: OnFailure
15          containers:
16            - name: scaler
17              image: bitnami/kubectl:1.29
18              command:
19                - /bin/sh
20                - -c
21                - |
22                  for ns in dev-payments dev-orders dev-auth; do
23                    kubectl get deployments -n $ns -o name | \
24                      xargs -I{} kubectl scale {} --replicas=0 -n $ns
25                  done
26---
27apiVersion: batch/v1
28kind: CronJob
29metadata:
30  name: scale-up-dev
31  namespace: platform
32spec:
33  schedule: "0 8 * * 1-5"    # 8 AM weekdays UTC
34  jobTemplate:
35    spec:
36      template:
37        spec:
38          serviceAccountName: namespace-scaler
39          restartPolicy: OnFailure
40          containers:
41            - name: scaler
42              image: bitnami/kubectl:1.29
43              command:
44                - /bin/sh
45                - -c
46                - |
47                  for ns in dev-payments dev-orders dev-auth; do
48                    kubectl get deployments -n $ns -o name | \
49                      xargs -I{} kubectl scale {} --replicas=1 -n $ns
50                  done

For more sophisticated scale-down (respecting original replica counts, supporting StatefulSets), use Kube-Downscaler or similar tooling.

Storage Cost Optimization

Unattached EBS volumes from deleted PVCs accumulate silently:

bash

1# Find PVCs with no associated running pod
2kubectl get pvc -A -o json | jq '
3  .items[] |
4  select(.status.phase == "Bound") |
5  {
6    namespace: .metadata.namespace,
7    name: .metadata.name,
8    storage: .spec.resources.requests.storage,
9    storageClass: .spec.storageClassName
10  }
11'
12
13# Find PVs that are Released (PVC deleted but PV still exists)
14kubectl get pv | grep Released
15
16# AWS: Find unattached EBS volumes (generate cost from unattached volumes)
17aws ec2 describe-volumes \
18  --filters Name=status,Values=available \
19  --query 'Volumes[*].{ID:VolumeId,Size:Size,Created:CreateTime}' \
20  --output table

yaml

1# StorageClass with reclaimPolicy: Delete (default for EKS managed StorageClasses)
2# Ensure you're NOT using reclaimPolicy: Retain unless you need data recovery
3apiVersion: storage.k8s.io/v1
4kind: StorageClass
5metadata:
6  name: gp3
7provisioner: ebs.csi.aws.com
8parameters:
9  type: gp3
10  encrypted: "true"
11reclaimPolicy: Delete          # PV deleted when PVC is deleted — no orphaned volumes
12volumeBindingMode: WaitForFirstConsumer

Query Kubecost's API for cost attribution data:

bash

# Cost for the payments namespace, last 7 days
curl "http://kubecost.kubecost.svc/model/allocation?window=7d&aggregate=namespace&filter=namespace:payments"

# Efficiency for all namespaces
curl "http://kubecost.kubecost.svc/model/allocation?window=1d&aggregate=namespace" | \
  jq '.data[0] | to_entries[] | {namespace: .key, efficiency: .value.cpuEfficiency}'

Namespace Chargeback with Labels

Label-based cost allocation requires consistent labeling. Enforce labels with Kyverno and query with Kubecost:

yaml

1# Kyverno policy: require team and env labels on all workloads
2apiVersion: kyverno.io/v1
3kind: ClusterPolicy
4metadata:
5  name: require-cost-labels
6spec:
7  validationFailureAction: Enforce
8  rules:
9    - name: check-labels
10      match:
11        any:
12          - resources:
13              kinds: ["Deployment", "StatefulSet", "DaemonSet"]
14      validate:
15        message: "Labels 'team' and 'env' are required for cost allocation."
16        pattern:
17          metadata:
18            labels:
19              team: "?*"
20              env: "?*"

Idle Resource Detection

bash

1# Pods with zero CPU usage in the past 24 hours (Prometheus query)
2# Run in Grafana or via promtool:
3sum by (namespace, pod) (
4  increase(container_cpu_usage_seconds_total[24h])
5) == 0
6
7# Unused PVCs (not mounted by any pod)
8kubectl get pvc -A -o json | jq -r '
9  .items[] |
10  select(.status.phase == "Bound") |
11  select(.metadata.annotations."volume.kubernetes.io/selected-node" != null) |
12  [.metadata.namespace, .metadata.name, .spec.resources.requests.storage] |
13  @csv'
14# Then cross-reference with running pods to find orphaned PVCs

Quick Wins Reference

Action	Typical Savings	Effort
Apply VPA recommendations to all workloads	30-50% CPU	Medium
Enable Karpenter consolidation	15-30% nodes	Low (config change)
Switch batch workloads to spot	60-80% on batch nodes	Low
Switch to Graviton for arm64-compatible workloads	15-20% per node	Medium
Delete orphaned PVCs	Variable	Low
Set namespace resource quotas to prevent over-requesting	Preventative	Low
Enable node auto-provisioning for dev/staging off-hours	40-60% dev costs	Medium

Frequently Asked Questions

How do I set a cluster-wide compute budget?

Use Karpenter's limits in the NodePool to cap total CPU and memory across all Karpenter-managed nodes. Combine with namespace ResourceQuota to distribute the budget across teams.

Should I use spot for the control plane?

No — EKS manages the control plane (it's a managed service). Spot applies to worker nodes only. For system-critical workloads on worker nodes (cert-manager, CoreDNS, ingress controller), use on-demand via a dedicated NodePool with capacity-type: on-demand and appropriate taints/tolerations.

For Karpenter consolidation interacting with PodDisruptionBudgets, see Kubernetes PodDisruptionBudget and Graceful Shutdown Patterns. For VPA right-sizing in detail alongside HPA interaction, see Kubernetes HPA v2: Custom Metrics, Behavior Tuning, and Scaling Patterns.

Reducing Kubernetes infrastructure costs without compromising reliability? Talk to us at Coding Protocols — we help platform teams implement FinOps practices that cut cluster costs by 40-60% while maintaining production SLOs.

Kubernetes Cost Optimization: FinOps Patterns for EKS at Scale

The Cost Hierarchy

VPA for Right-Sizing Resource Requests

Karpenter for Node Efficiency

Spot Instance Interruption Handling

Right-Sizing Node Types

Spot Interruption Best Practices

Kubecost for Cost Visibility

Scheduled Scaling

Storage Cost Optimization

Namespace Chargeback with Labels

Idle Resource Detection

Quick Wins Reference

Frequently Asked Questions

How do I set a cluster-wide compute budget?

Should I use spot for the control plane?

Related Topics

Read Next

Kubernetes Cost Optimisation: Spot Instances, Right-Sizing, and Namespace Budgets

Karpenter v1: Node Provisioning, Consolidation, and Drift

Kubernetes VPA: Right-Sizing Containers Without Manual Tuning