Kubernetes Cost Optimisation: Spot, Right-Sizing, Namespace Budgets (2026)

A typical Kubernetes cluster runs at 10–30% actual CPU utilisation. The gap between what's allocated (resource requests) and what's consumed is where the waste lives — and it compounds. Oversized requests lead to underutilised nodes, which leads to more nodes than needed, which leads to cloud bills that don't match actual workload.

Cost optimisation in Kubernetes isn't about squeezing every dollar out of the infrastructure — it's about eliminating the structural waste that makes clusters 3–5x more expensive than they need to be. This post covers the main levers: workload right-sizing, spot instance strategy, node bin-packing, and namespace-level cost accountability.

The Cost Layers

Kubernetes cloud cost breaks down into:

Layer	Typical Share	Primary Lever
Compute (nodes)	70–80%	Spot instances, right-sizing, bin-packing
Storage (PVCs)	10–15%	StorageClass tiering, lifecycle policies
Data transfer	5–10%	Service mesh, cross-AZ traffic reduction
Control plane	1–5%	Cluster consolidation (AKS free; EKS/GKE $73/mo/cluster)
Load balancers	1–3%	Fewer LB Services, use Ingress/Gateway

Compute dominates. Fix the compute layer first.

Right-Sizing Resource Requests

The most impactful single change: set resource requests that reflect actual usage rather than conservative guesses.

The Problem with Oversized Requests

Resource requests determine scheduling — the Kubernetes scheduler places pods on nodes based on requested CPU and memory, not actual usage. A pod requesting 2 CPU that uses 0.1 CPU occupies 2 CPU worth of node capacity. If every pod in your cluster is overprovisioned by 5x, you need 5x more nodes than necessary.

Measuring Actual Usage

bash

1# Current usage vs requests for all pods in a namespace
2kubectl top pods -n production --sort-by=cpu
3
4# More detailed: use kubectl-resource-capacity (krew plugin)
5kubectl resource-capacity --pods --sort cpu.util
6
7# Namespace-level summary
8kubectl resource-capacity --namespace production

For a structured view of allocation vs usage, use kube-capacity:

bash

kubectl krew install resource-capacity
kubectl resource-capacity --namespace production --pods --util

The output shows REQUEST CPU vs CURRENT CPU vs LIMIT CPU per pod — the gap between request and current is the waste.

VPA for Automated Right-Sizing

Vertical Pod Autoscaler (VPA) analyzes historical CPU and memory usage and recommends or automatically adjusts resource requests.

yaml

1apiVersion: autoscaling.k8s.io/v1
2kind: VerticalPodAutoscaler
3metadata:
4  name: api-vpa
5  namespace: production
6spec:
7  targetRef:
8    apiVersion: apps/v1
9    kind: Deployment
10    name: api
11  updatePolicy:
12    updateMode: "Off"   # Recommendation only — don't auto-apply yet
13  resourcePolicy:
14    containerPolicies:
15      - containerName: api
16        minAllowed:
17          cpu: 50m
18          memory: 64Mi
19        maxAllowed:
20          cpu: "4"
21          memory: 4Gi

With updateMode: "Off", VPA observes without acting. Check recommendations:

bash

1kubectl describe vpa api-vpa -n production
2# Shows:
3# Recommendation:
4#   Container Recommendations:
5#     Container Name: api
6#     Lower Bound:    cpu: 100m, memory: 128Mi
7#     Target:         cpu: 250m, memory: 512Mi
8#     Upper Bound:    cpu: 500m, memory: 1Gi

Use VPA recommendations to manually update Deployment specs. Start with Off mode for all services, collect recommendations over 2–4 weeks, then apply. VPA has four update modes:

Off — recommendations only, no changes applied
Initial — applies recommendations at pod creation time, no restarts of running pods
Recreate — evicts and recreates pods to apply new resource values
Auto — uses in-place pod resource update (stable in Kubernetes 1.33+) where supported, falls back to eviction/recreation on older clusters

Use Initial for a middle ground that sets requests at pod creation without disrupting running pods.

Important: VPA in Auto or Recreate mode is incompatible with HPA when both manage the same resource metric (e.g., both acting on CPU). VPA and HPA can safely coexist when HPA scales on custom metrics while VPA manages resource requests. If you use CPU-based HPA, use VPA in Off or Initial mode only.

Manual Right-Sizing Targets

If you don't want VPA, a practical starting point:

Set requests at P95 of observed CPU usage over 7 days × 1.2 (20% headroom)
Set memory requests at P99 of observed memory (memory doesn't compress — OOMKills are more disruptive than CPU throttling)
Set CPU limits 2–3× the request (allows bursting; prevents CPU throttling under brief spikes)
Set memory limits generously rather than omitting them — omitting limits protects against the OOMKiller initially but exposes nodes to memory pressure from unbounded processes. Size limits 2-3x the typical steady-state usage

Spot Instances

Spot instances (AWS), preemptible VMs (GCP), or Spot VMs (Azure) run at 60–80% discount over on-demand pricing. The trade-off: the cloud provider can reclaim them with 2-minute (AWS) or 30-second (GCP) notice.

For stateless workloads with HPA, spot instances are a high-value change. For stateful workloads, the calculus is more complex.

Karpenter Spot Strategy

Karpenter on EKS is the most capable spot orchestration layer. Configure a NodePool with both spot and on-demand in priority order:

yaml

1apiVersion: karpenter.sh/v1
2kind: NodePool
3metadata:
4  name: default
5spec:
6  template:
7    spec:
8      requirements:
9        - key: karpenter.sh/capacity-type
10          operator: In
11          values: ["spot", "on-demand"]   # Spot preferred; falls back to on-demand
12        - key: kubernetes.io/arch
13          operator: In
14          values: ["amd64", "arm64"]
15        - key: node.kubernetes.io/instance-type
16          operator: In
17          values:
18            # Diversify across multiple instance families to reduce spot interruption risk
19            - m5.large
20            - m5.xlarge
21            - m6i.large
22            - m6i.xlarge
23            - m6a.large
24            - m6a.xlarge
25            - m7i.large
26            - m7i.xlarge
27            - c5.xlarge
28            - c6i.xlarge
29  disruption:
30    consolidationPolicy: WhenEmptyOrUnderutilized
31    # consolidateAfter is only valid with WhenEmpty, not WhenEmptyOrUnderutilized
32    budgets:
33      - nodes: "10%"   # Karpenter will not disrupt more than 10% of nodes at once
34  limits:
35    cpu: "1000"
36    memory: 4000Gi

Key principle: diversify instance families. A spot interruption event typically affects a specific instance type in a specific AZ. If you're running only m5.xlarge, an interruption event can take down a large fraction of your cluster simultaneously. Diversifying across m5, m6i, m6a, m7i, c5, c6i reduces correlated interruption risk.

Spot Interruption Handling

When AWS sends a spot interruption notice (2 minutes before termination), Karpenter receives the notification via an SQS queue and immediately begins graceful node draining — cordon the node and evict pods before the instance is terminated.

yaml

1# Karpenter reads interruption events from SQS
2# Set up via the Karpenter controller configuration:
3apiVersion: karpenter.k8s.aws/v1
4kind: EC2NodeClass
5metadata:
6  name: default
7spec:
8  amiFamily: AL2
9  subnetSelectorTerms:
10    - tags:
11        karpenter.sh/discovery: my-cluster
12  securityGroupSelectorTerms:
13    - tags:
14        karpenter.sh/discovery: my-cluster
15  # Karpenter automatically handles spot interruption via EventBridge + SQS
16  # when deployed with the correct IAM permissions

For the SQS queue and EventBridge rules, Karpenter's CloudFormation template sets these up automatically when you deploy via the official installation guide.

Workloads Suitable for Spot

Workload Type	Spot Suitable?	Notes
Stateless API pods (with HPA)	Yes	HPA replaces interrupted pods
Background job workers	Yes	Jobs retry; spot is ideal for batch
CI/CD runners	Yes	Short-lived, isolated, naturally fault-tolerant
Development namespaces	Yes	Interruptions are tolerable
Stateful databases (primary)	No	Interruption causes failover; use on-demand
Long-running batch (>2h, no checkpointing)	Caution	May lose progress on interruption
Ingress/gateway pods	Caution	Disruption affects all traffic; keep on on-demand or use 2+ AZs

Separating On-Demand and Spot Node Pools

Use node selectors or pod topology constraints to pin sensitive workloads to on-demand nodes:

yaml

1# NodePool for on-demand only (critical workloads)
2apiVersion: karpenter.sh/v1
3kind: NodePool
4metadata:
5  name: on-demand
6spec:
7  template:
8    metadata:
9      labels:
10        node-type: on-demand
11    spec:
12      requirements:
13        - key: karpenter.sh/capacity-type
14          operator: In
15          values: ["on-demand"]

yaml

# Pod that requires on-demand
spec:
  nodeSelector:
    node-type: on-demand

Or use taints to keep spot nodes clean by default and explicitly tolerate them:

yaml

# In the spot NodePool spec:
taints:
  - key: spot
    value: "true"
    effect: NoSchedule

yaml

# In pods that can tolerate spot:
tolerations:
  - key: spot
    operator: Equal
    value: "true"
    effect: NoSchedule

Node Bin-Packing and Consolidation

Spot strategy gets you a 60–80% discount on node cost. Consolidation reduces the number of nodes. Both together can cut compute costs dramatically.

Karpenter Consolidation

Karpenter's consolidation actively removes underutilised nodes by moving their pods to other nodes:

yaml

disruption:
  consolidationPolicy: WhenEmptyOrUnderutilized   # Remove underutilised nodes by moving pods
  # consolidateAfter is only valid with WhenEmpty, not WhenEmptyOrUnderutilized

WhenEmptyOrUnderutilized is the more aggressive policy — Karpenter moves pods off underutilised nodes to pack them onto fewer nodes, then removes the emptied nodes. For a more conservative approach that only removes nodes that are already empty:

yaml

disruption:
  consolidationPolicy: WhenEmpty    # Only remove completely empty nodes

WhenEmpty is safer for production (no pod movements, only cleanup of empty nodes). Use WhenEmptyOrUnderutilized in dev/staging for active bin-packing, WhenEmpty in production to avoid unnecessary pod disruption.

Cluster Autoscaler Expander

If using the Cluster Autoscaler (not Karpenter), configure the expander to choose the most cost-efficient node group:

yaml

# cluster-autoscaler flags
--expander=price        # Choose cheapest node group
--scale-down-enabled=true
--scale-down-delay-after-add=10m
--scale-down-unneeded-time=10m
--scale-down-utilization-threshold=0.5   # Scale down if node is <50% utilised

PodDisruptionBudgets for Safe Consolidation

Consolidation is safe only if your application handles pod evictions correctly. PodDisruptionBudgets (PDBs) tell the scheduler how many pods can be disrupted simultaneously:

yaml

1apiVersion: policy/v1
2kind: PodDisruptionBudget
3metadata:
4  name: api-pdb
5  namespace: production
6spec:
7  minAvailable: 2   # Always keep at least 2 pods available during disruption
8  selector:
9    matchLabels:
10      app: api

Without PDBs, consolidation can evict all pods of a deployment simultaneously. With a PDB of minAvailable: 2, consolidation evicts one pod at a time, waiting for it to be rescheduled before evicting the next.

Namespace Cost Allocation

Cloud billing shows total cluster cost. To understand which team or service is responsible for what share of cost, you need namespace-level cost allocation.

Kubernetes Cost Allocation Tools

Kubecost (open-source community edition available) allocates cloud costs to namespaces, controllers, and labels by matching resource usage to cloud billing data:

bash

helm repo add cost-analyzer https://kubecost.github.io/cost-analyzer/
helm upgrade --install kubecost cost-analyzer/cost-analyzer \
  --namespace kubecost \
  --create-namespace
# No token required for Kubecost community edition (v2.x+)

Kubecost's namespace view shows cost per namespace over any time range — the foundation for FinOps conversations with engineering teams.

OpenCost (CNCF project, originally donated by Kubecost) provides the same cost allocation without the Kubecost UI:

bash

helm install opencost opencost/opencost --namespace opencost --create-namespace

OpenCost exposes a Prometheus metrics endpoint and Grafana dashboards for cost per namespace, workload, and label.

Setting Namespace Cost Budgets

ResourceQuota sets hard limits on resource consumption, which is the prerequisite for cost accountability:

yaml

1apiVersion: v1
2kind: ResourceQuota
3metadata:
4  name: team-a-budget
5  namespace: team-a-prod
6spec:
7  hard:
8    requests.cpu: "8"
9    requests.memory: 16Gi
10    limits.cpu: "16"
11    limits.memory: 32Gi

With this quota, team-a's maximum monthly cost for compute in team-a-prod is bounded. At current AWS pricing in us-east-1 (on-demand):

8 CPU request hours → ~$0.048/hour per m5.large-equivalent
16 GB memory request hours → ~$0.192/hour per GB-hour

The actual cost depends on the instance mix and spot vs on-demand ratio, but the quota gives you a defined maximum.

Chargeback Labels

Add cost-allocation labels to all workloads for Kubecost/OpenCost attribution:

yaml

metadata:
  labels:
    team: platform
    env: production
    cost-center: infra-1234
    product: api-gateway

These labels become dimensions in Kubecost's cost explorer, letting you break down cloud spend by team, cost center, product, and environment simultaneously.

Graviton / ARM64 Nodes

AWS Graviton (arm64) instances offer 20–40% better price-performance than equivalent x86 instances. For most containerised workloads, migrating to arm64 is a zero-effort cost reduction once images are built for both architectures.

See AWS Graviton ARM64 Migration Guide for the full migration path. From a cost perspective:

yaml

1# Karpenter NodePool preferring Graviton
2requirements:
3  - key: kubernetes.io/arch
4    operator: In
5    values: ["arm64", "amd64"]   # arm64 first preference
6  - key: node.kubernetes.io/instance-type
7    operator: In
8    values:
9      - m7g.large       # Graviton3 — best price-performance for general workloads
10      - m7g.xlarge
11      - m6g.large
12      - m6g.xlarge
13      - m5.large        # x86 fallback
14      - m6i.large

Karpenter will prefer Graviton instances and fall back to x86 when Graviton spot capacity is unavailable.

Idle and Unused Resources

Right-sizing active workloads is the primary lever. The secondary lever is removing workloads that consume resources without delivering value.

Finding Idle Deployments

bash

1# Deployments with zero requests in the last 7 days (via Prometheus)
2# metric: kube_deployment_status_replicas{namespace="production"} > 0
3# cross: rate(http_requests_total{namespace="production"}[7d]) == 0
4
5# Quick check — deployments with 0 replicas
6kubectl get deployments -A | awk '$4 == "0"'
7
8# Pods with very low CPU (<10m over 24h)
9kubectl top pods -A | awk '$3 < "10m"' | sort -k3 -n

Expired and Orphaned Resources

bash

1# PVCs not mounted by any pod
2kubectl get pvc -A | grep -v Bound
3kubectl get pv | grep Released   # Released PVs still incurring EBS cost
4
5# Old ReplicaSets (not owned by active Deployments)
6kubectl get rs -A | awk '$3 == "0" && $4 == "0"'
7
8# LoadBalancer Services with no backend pods
9kubectl get svc -A | grep LoadBalancer
10# Then verify each has healthy endpoints:
11kubectl get endpoints -A | grep "<none>"

Released PersistentVolumes with Retain reclaim policy accumulate silently — EBS volumes still exist and still cost money even when the PVC has been deleted. Audit and delete or reprovision.

Scaling Dev Namespaces to Zero

Development namespaces often run at full capacity overnight and on weekends. An automated scale-to-zero mechanism can reduce dev cluster cost by 40–60%:

bash

1# Kube-downscaler or Kubernetes Event-Driven Autoscaling (KEDA) ScaledJob
2# — scales Deployments to 0 outside business hours
3
4# Simple bash approach for namespaces with a specific label
5kubectl get deployments -n team-a-dev -o name | xargs -I{} \
6  kubectl scale {} --replicas=0 -n team-a-dev
7
8# Scale back up in the morning
9kubectl get deployments -n team-a-dev -o name | xargs -I{} \
10  kubectl scale {} --replicas=1 -n team-a-dev

Tools like kube-downscaler automate this with namespace annotations (downscaler/uptime: Mon-Fri 07:00-20:00 Europe/London).

Cost Optimisation Checklist

Work through these in order — each tier has higher implementation effort but also higher impact ceiling:

Tier 1: Quick Wins (days)

Enable VPA in Off mode on all namespaces — collect recommendations for 2 weeks
Apply VPA recommendations to at least the top 10 highest-CPU Deployments
Enable Karpenter consolidation (WhenEmptyOrUnderutilized) on non-production clusters
Delete orphaned PVCs and Released PVs
Delete LoadBalancer Services not attached to active workloads

Tier 2: Medium Effort (weeks)

Move stateless workloads to spot instances via Karpenter or CA spot groups
Apply ResourceQuota to all team namespaces (prerequisite for cost accountability)
Install Kubecost or OpenCost — establish namespace cost baselines
Set up scale-to-zero for dev namespaces after hours

Tier 3: Strategic (quarters)

Migrate to Graviton instances for compute-heavy workloads
Implement FinOps review process — monthly cost review with namespace owners
Consolidate small clusters — fewer, larger clusters have lower control-plane overhead
Evaluate GKE Autopilot or EKS auto mode if operational simplicity outweighs per-pod pricing

Frequently Asked Questions

How much can I realistically save?

Teams that have never done cost optimisation typically see 40–60% reduction from the combination of right-sizing requests and adding spot instances. The first two tiers (quick wins + spot) account for most of it. Graviton migration adds another 20–30% on top. Teams with mature cost practices can get to 70%+ below a naive baseline.

Is spot risky for production?

For stateless workloads with >=2 replicas and PDBs, spot interruptions are transparent — the pod is evicted, rescheduled on another node, and HPA maintains the replica count. The risk is real but manageable. Database primaries, anything with a long startup time (>5 minutes), and anything without PDBs should stay on on-demand.

VPA and HPA conflict — how do I handle this?

Use VPA in Off or Initial mode to set baseline resource requests. HPA handles horizontal scaling. The conflict only occurs when both are managing the same resource (e.g., VPA auto-adjusting CPU requests while HPA scales on CPU) — with VPA in Off mode, there's no conflict.

How do I allocate cost to shared platform components (Prometheus, Argo CD)?

Tag the monitoring and argocd namespaces with cost-center: platform and attribute them to the platform team. Kubecost lets you define shared cost allocation strategies — e.g., distribute 10% of monitoring cost to each tenant namespace proportionally by their resource usage.

What about EKS add-ons (CoreDNS, kube-proxy, VPC CNI)?

These are node-level costs already included in compute pricing. The main lever here is node count — fewer, larger nodes mean fewer CoreDNS pods, fewer kube-proxy DaemonSet pods, and lower overhead as a fraction of total cost.

For right-sizing with VPA and HPA integration, see Kubernetes Resource Requests and Limits: A Production Guide. For Karpenter setup, see How to Install Karpenter on EKS. For Graviton migration, see AWS Graviton ARM64 Migration Guide. For a deep dive on Karpenter NodePool configuration, consolidation, and drift detection, see Karpenter v1: Node Provisioning, Consolidation, and Drift.

Running a cost review for a Kubernetes cluster? Talk to us at Coding Protocols — we help platform teams identify and eliminate structural waste without compromising reliability.

Kubernetes Cost Optimisation: Spot Instances, Right-Sizing, and Namespace Budgets

The Cost Layers

Right-Sizing Resource Requests

The Problem with Oversized Requests

Measuring Actual Usage

VPA for Automated Right-Sizing

Manual Right-Sizing Targets

Spot Instances

Karpenter Spot Strategy

Spot Interruption Handling

Workloads Suitable for Spot

Separating On-Demand and Spot Node Pools

Node Bin-Packing and Consolidation

Karpenter Consolidation

Cluster Autoscaler Expander

PodDisruptionBudgets for Safe Consolidation

Namespace Cost Allocation

Kubernetes Cost Allocation Tools

Setting Namespace Cost Budgets

Chargeback Labels

Graviton / ARM64 Nodes

Idle and Unused Resources

Finding Idle Deployments

Expired and Orphaned Resources

Scaling Dev Namespaces to Zero

Cost Optimisation Checklist

Frequently Asked Questions

How much can I realistically save?

Is spot risky for production?

VPA and HPA conflict — how do I handle this?

How do I allocate cost to shared platform components (Prometheus, Argo CD)?

What about EKS add-ons (CoreDNS, kube-proxy, VPC CNI)?

Related Topics

Read Next

Kubernetes Cost Optimization: FinOps Patterns for EKS at Scale

Karpenter v1: Node Provisioning, Consolidation, and Drift

EKS vs GKE vs AKS: Choosing Your Managed Kubernetes Platform