Kubernetes Cost Optimisation: Spot Instances, Right-Sizing, and Namespace Budgets
Kubernetes clusters are expensive to over-provision and dangerous to under-provision. The path to cost efficiency isn't cutting resources — it's making sure the resources you pay for are actually used. Here's how to right-size workloads, use spot instances safely, and set namespace-level cost accountability.

A typical Kubernetes cluster runs at 10–30% actual CPU utilisation. The gap between what's allocated (resource requests) and what's consumed is where the waste lives — and it compounds. Oversized requests lead to underutilised nodes, which leads to more nodes than needed, which leads to cloud bills that don't match actual workload.
Cost optimisation in Kubernetes isn't about squeezing every dollar out of the infrastructure — it's about eliminating the structural waste that makes clusters 3–5x more expensive than they need to be. This post covers the main levers: workload right-sizing, spot instance strategy, node bin-packing, and namespace-level cost accountability.
The Cost Layers
Kubernetes cloud cost breaks down into:
| Layer | Typical Share | Primary Lever |
|---|---|---|
| Compute (nodes) | 70–80% | Spot instances, right-sizing, bin-packing |
| Storage (PVCs) | 10–15% | StorageClass tiering, lifecycle policies |
| Data transfer | 5–10% | Service mesh, cross-AZ traffic reduction |
| Control plane | 1–5% | Cluster consolidation (AKS free; EKS/GKE $73/mo/cluster) |
| Load balancers | 1–3% | Fewer LB Services, use Ingress/Gateway |
Compute dominates. Fix the compute layer first.
Right-Sizing Resource Requests
The most impactful single change: set resource requests that reflect actual usage rather than conservative guesses.
The Problem with Oversized Requests
Resource requests determine scheduling — the Kubernetes scheduler places pods on nodes based on requested CPU and memory, not actual usage. A pod requesting 2 CPU that uses 0.1 CPU occupies 2 CPU worth of node capacity. If every pod in your cluster is overprovisioned by 5x, you need 5x more nodes than necessary.
Measuring Actual Usage
1# Current usage vs requests for all pods in a namespace
2kubectl top pods -n production --sort-by=cpu
3
4# More detailed: use kubectl-resource-capacity (krew plugin)
5kubectl resource-capacity --pods --sort cpu.util
6
7# Namespace-level summary
8kubectl resource-capacity --namespace productionFor a structured view of allocation vs usage, use kube-capacity:
kubectl krew install resource-capacity
kubectl resource-capacity --namespace production --pods --utilThe output shows REQUEST CPU vs CURRENT CPU vs LIMIT CPU per pod — the gap between request and current is the waste.
VPA for Automated Right-Sizing
Vertical Pod Autoscaler (VPA) analyzes historical CPU and memory usage and recommends or automatically adjusts resource requests.
1apiVersion: autoscaling.k8s.io/v1
2kind: VerticalPodAutoscaler
3metadata:
4 name: api-vpa
5 namespace: production
6spec:
7 targetRef:
8 apiVersion: apps/v1
9 kind: Deployment
10 name: api
11 updatePolicy:
12 updateMode: "Off" # Recommendation only — don't auto-apply yet
13 resourcePolicy:
14 containerPolicies:
15 - containerName: api
16 minAllowed:
17 cpu: 50m
18 memory: 64Mi
19 maxAllowed:
20 cpu: "4"
21 memory: 4GiWith updateMode: "Off", VPA observes without acting. Check recommendations:
1kubectl describe vpa api-vpa -n production
2# Shows:
3# Recommendation:
4# Container Recommendations:
5# Container Name: api
6# Lower Bound: cpu: 100m, memory: 128Mi
7# Target: cpu: 250m, memory: 512Mi
8# Upper Bound: cpu: 500m, memory: 1GiUse VPA recommendations to manually update Deployment specs. Start with Off mode for all services, collect recommendations over 2–4 weeks, then apply. VPA has four update modes:
Off— recommendations only, no changes appliedInitial— applies recommendations at pod creation time, no restarts of running podsRecreate— evicts and recreates pods to apply new resource valuesAuto— uses in-place pod resource update (stable in Kubernetes 1.33+) where supported, falls back to eviction/recreation on older clusters
Use Initial for a middle ground that sets requests at pod creation without disrupting running pods.
Important: VPA in Auto or Recreate mode is incompatible with HPA when both manage the same resource metric (e.g., both acting on CPU). VPA and HPA can safely coexist when HPA scales on custom metrics while VPA manages resource requests. If you use CPU-based HPA, use VPA in Off or Initial mode only.
Manual Right-Sizing Targets
If you don't want VPA, a practical starting point:
- Set requests at P95 of observed CPU usage over 7 days × 1.2 (20% headroom)
- Set memory requests at P99 of observed memory (memory doesn't compress — OOMKills are more disruptive than CPU throttling)
- Set CPU limits 2–3× the request (allows bursting; prevents CPU throttling under brief spikes)
- Set memory limits generously rather than omitting them — omitting limits protects against the OOMKiller initially but exposes nodes to memory pressure from unbounded processes. Size limits 2-3x the typical steady-state usage
Spot Instances
Spot instances (AWS), preemptible VMs (GCP), or Spot VMs (Azure) run at 60–80% discount over on-demand pricing. The trade-off: the cloud provider can reclaim them with 2-minute (AWS) or 30-second (GCP) notice.
For stateless workloads with HPA, spot instances are a high-value change. For stateful workloads, the calculus is more complex.
Karpenter Spot Strategy
Karpenter on EKS is the most capable spot orchestration layer. Configure a NodePool with both spot and on-demand in priority order:
1apiVersion: karpenter.sh/v1
2kind: NodePool
3metadata:
4 name: default
5spec:
6 template:
7 spec:
8 requirements:
9 - key: karpenter.sh/capacity-type
10 operator: In
11 values: ["spot", "on-demand"] # Spot preferred; falls back to on-demand
12 - key: kubernetes.io/arch
13 operator: In
14 values: ["amd64", "arm64"]
15 - key: node.kubernetes.io/instance-type
16 operator: In
17 values:
18 # Diversify across multiple instance families to reduce spot interruption risk
19 - m5.large
20 - m5.xlarge
21 - m6i.large
22 - m6i.xlarge
23 - m6a.large
24 - m6a.xlarge
25 - m7i.large
26 - m7i.xlarge
27 - c5.xlarge
28 - c6i.xlarge
29 disruption:
30 consolidationPolicy: WhenEmptyOrUnderutilized
31 # consolidateAfter is only valid with WhenEmpty, not WhenEmptyOrUnderutilized
32 budgets:
33 - nodes: "10%" # Karpenter will not disrupt more than 10% of nodes at once
34 limits:
35 cpu: "1000"
36 memory: 4000GiKey principle: diversify instance families. A spot interruption event typically affects a specific instance type in a specific AZ. If you're running only m5.xlarge, an interruption event can take down a large fraction of your cluster simultaneously. Diversifying across m5, m6i, m6a, m7i, c5, c6i reduces correlated interruption risk.
Spot Interruption Handling
When AWS sends a spot interruption notice (2 minutes before termination), Karpenter receives the notification via an SQS queue and immediately begins graceful node draining — cordon the node and evict pods before the instance is terminated.
1# Karpenter reads interruption events from SQS
2# Set up via the Karpenter controller configuration:
3apiVersion: karpenter.k8s.aws/v1
4kind: EC2NodeClass
5metadata:
6 name: default
7spec:
8 amiFamily: AL2
9 subnetSelectorTerms:
10 - tags:
11 karpenter.sh/discovery: my-cluster
12 securityGroupSelectorTerms:
13 - tags:
14 karpenter.sh/discovery: my-cluster
15 # Karpenter automatically handles spot interruption via EventBridge + SQS
16 # when deployed with the correct IAM permissionsFor the SQS queue and EventBridge rules, Karpenter's CloudFormation template sets these up automatically when you deploy via the official installation guide.
Workloads Suitable for Spot
| Workload Type | Spot Suitable? | Notes |
|---|---|---|
| Stateless API pods (with HPA) | Yes | HPA replaces interrupted pods |
| Background job workers | Yes | Jobs retry; spot is ideal for batch |
| CI/CD runners | Yes | Short-lived, isolated, naturally fault-tolerant |
| Development namespaces | Yes | Interruptions are tolerable |
| Stateful databases (primary) | No | Interruption causes failover; use on-demand |
| Long-running batch (>2h, no checkpointing) | Caution | May lose progress on interruption |
| Ingress/gateway pods | Caution | Disruption affects all traffic; keep on on-demand or use 2+ AZs |
Separating On-Demand and Spot Node Pools
Use node selectors or pod topology constraints to pin sensitive workloads to on-demand nodes:
1# NodePool for on-demand only (critical workloads)
2apiVersion: karpenter.sh/v1
3kind: NodePool
4metadata:
5 name: on-demand
6spec:
7 template:
8 metadata:
9 labels:
10 node-type: on-demand
11 spec:
12 requirements:
13 - key: karpenter.sh/capacity-type
14 operator: In
15 values: ["on-demand"]# Pod that requires on-demand
spec:
nodeSelector:
node-type: on-demandOr use taints to keep spot nodes clean by default and explicitly tolerate them:
# In the spot NodePool spec:
taints:
- key: spot
value: "true"
effect: NoSchedule# In pods that can tolerate spot:
tolerations:
- key: spot
operator: Equal
value: "true"
effect: NoScheduleNode Bin-Packing and Consolidation
Spot strategy gets you a 60–80% discount on node cost. Consolidation reduces the number of nodes. Both together can cut compute costs dramatically.
Karpenter Consolidation
Karpenter's consolidation actively removes underutilised nodes by moving their pods to other nodes:
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized # Remove underutilised nodes by moving pods
# consolidateAfter is only valid with WhenEmpty, not WhenEmptyOrUnderutilizedWhenEmptyOrUnderutilized is the more aggressive policy — Karpenter moves pods off underutilised nodes to pack them onto fewer nodes, then removes the emptied nodes. For a more conservative approach that only removes nodes that are already empty:
disruption:
consolidationPolicy: WhenEmpty # Only remove completely empty nodesWhenEmpty is safer for production (no pod movements, only cleanup of empty nodes). Use WhenEmptyOrUnderutilized in dev/staging for active bin-packing, WhenEmpty in production to avoid unnecessary pod disruption.
Cluster Autoscaler Expander
If using the Cluster Autoscaler (not Karpenter), configure the expander to choose the most cost-efficient node group:
# cluster-autoscaler flags
--expander=price # Choose cheapest node group
--scale-down-enabled=true
--scale-down-delay-after-add=10m
--scale-down-unneeded-time=10m
--scale-down-utilization-threshold=0.5 # Scale down if node is <50% utilisedPodDisruptionBudgets for Safe Consolidation
Consolidation is safe only if your application handles pod evictions correctly. PodDisruptionBudgets (PDBs) tell the scheduler how many pods can be disrupted simultaneously:
1apiVersion: policy/v1
2kind: PodDisruptionBudget
3metadata:
4 name: api-pdb
5 namespace: production
6spec:
7 minAvailable: 2 # Always keep at least 2 pods available during disruption
8 selector:
9 matchLabels:
10 app: apiWithout PDBs, consolidation can evict all pods of a deployment simultaneously. With a PDB of minAvailable: 2, consolidation evicts one pod at a time, waiting for it to be rescheduled before evicting the next.
Namespace Cost Allocation
Cloud billing shows total cluster cost. To understand which team or service is responsible for what share of cost, you need namespace-level cost allocation.
Kubernetes Cost Allocation Tools
Kubecost (open-source community edition available) allocates cloud costs to namespaces, controllers, and labels by matching resource usage to cloud billing data:
helm repo add cost-analyzer https://kubecost.github.io/cost-analyzer/
helm upgrade --install kubecost cost-analyzer/cost-analyzer \
--namespace kubecost \
--create-namespace
# No token required for Kubecost community edition (v2.x+)Kubecost's namespace view shows cost per namespace over any time range — the foundation for FinOps conversations with engineering teams.
OpenCost (CNCF project, originally donated by Kubecost) provides the same cost allocation without the Kubecost UI:
helm install opencost opencost/opencost --namespace opencost --create-namespaceOpenCost exposes a Prometheus metrics endpoint and Grafana dashboards for cost per namespace, workload, and label.
Setting Namespace Cost Budgets
ResourceQuota sets hard limits on resource consumption, which is the prerequisite for cost accountability:
1apiVersion: v1
2kind: ResourceQuota
3metadata:
4 name: team-a-budget
5 namespace: team-a-prod
6spec:
7 hard:
8 requests.cpu: "8"
9 requests.memory: 16Gi
10 limits.cpu: "16"
11 limits.memory: 32GiWith this quota, team-a's maximum monthly cost for compute in team-a-prod is bounded. At current AWS pricing in us-east-1 (on-demand):
- 8 CPU request hours → ~$0.048/hour per m5.large-equivalent
- 16 GB memory request hours → ~$0.192/hour per GB-hour
The actual cost depends on the instance mix and spot vs on-demand ratio, but the quota gives you a defined maximum.
Chargeback Labels
Add cost-allocation labels to all workloads for Kubecost/OpenCost attribution:
metadata:
labels:
team: platform
env: production
cost-center: infra-1234
product: api-gatewayThese labels become dimensions in Kubecost's cost explorer, letting you break down cloud spend by team, cost center, product, and environment simultaneously.
Graviton / ARM64 Nodes
AWS Graviton (arm64) instances offer 20–40% better price-performance than equivalent x86 instances. For most containerised workloads, migrating to arm64 is a zero-effort cost reduction once images are built for both architectures.
See AWS Graviton ARM64 Migration Guide for the full migration path. From a cost perspective:
1# Karpenter NodePool preferring Graviton
2requirements:
3 - key: kubernetes.io/arch
4 operator: In
5 values: ["arm64", "amd64"] # arm64 first preference
6 - key: node.kubernetes.io/instance-type
7 operator: In
8 values:
9 - m7g.large # Graviton3 — best price-performance for general workloads
10 - m7g.xlarge
11 - m6g.large
12 - m6g.xlarge
13 - m5.large # x86 fallback
14 - m6i.largeKarpenter will prefer Graviton instances and fall back to x86 when Graviton spot capacity is unavailable.
Idle and Unused Resources
Right-sizing active workloads is the primary lever. The secondary lever is removing workloads that consume resources without delivering value.
Finding Idle Deployments
1# Deployments with zero requests in the last 7 days (via Prometheus)
2# metric: kube_deployment_status_replicas{namespace="production"} > 0
3# cross: rate(http_requests_total{namespace="production"}[7d]) == 0
4
5# Quick check — deployments with 0 replicas
6kubectl get deployments -A | awk '$4 == "0"'
7
8# Pods with very low CPU (<10m over 24h)
9kubectl top pods -A | awk '$3 < "10m"' | sort -k3 -nExpired and Orphaned Resources
1# PVCs not mounted by any pod
2kubectl get pvc -A | grep -v Bound
3kubectl get pv | grep Released # Released PVs still incurring EBS cost
4
5# Old ReplicaSets (not owned by active Deployments)
6kubectl get rs -A | awk '$3 == "0" && $4 == "0"'
7
8# LoadBalancer Services with no backend pods
9kubectl get svc -A | grep LoadBalancer
10# Then verify each has healthy endpoints:
11kubectl get endpoints -A | grep "<none>"Released PersistentVolumes with Retain reclaim policy accumulate silently — EBS volumes still exist and still cost money even when the PVC has been deleted. Audit and delete or reprovision.
Scaling Dev Namespaces to Zero
Development namespaces often run at full capacity overnight and on weekends. An automated scale-to-zero mechanism can reduce dev cluster cost by 40–60%:
1# Kube-downscaler or Kubernetes Event-Driven Autoscaling (KEDA) ScaledJob
2# — scales Deployments to 0 outside business hours
3
4# Simple bash approach for namespaces with a specific label
5kubectl get deployments -n team-a-dev -o name | xargs -I{} \
6 kubectl scale {} --replicas=0 -n team-a-dev
7
8# Scale back up in the morning
9kubectl get deployments -n team-a-dev -o name | xargs -I{} \
10 kubectl scale {} --replicas=1 -n team-a-devTools like kube-downscaler automate this with namespace annotations (downscaler/uptime: Mon-Fri 07:00-20:00 Europe/London).
Cost Optimisation Checklist
Work through these in order — each tier has higher implementation effort but also higher impact ceiling:
Tier 1: Quick Wins (days)
- Enable VPA in
Offmode on all namespaces — collect recommendations for 2 weeks - Apply VPA recommendations to at least the top 10 highest-CPU Deployments
- Enable Karpenter consolidation (
WhenEmptyOrUnderutilized) on non-production clusters - Delete orphaned PVCs and Released PVs
- Delete
LoadBalancerServices not attached to active workloads
Tier 2: Medium Effort (weeks)
- Move stateless workloads to spot instances via Karpenter or CA spot groups
- Apply ResourceQuota to all team namespaces (prerequisite for cost accountability)
- Install Kubecost or OpenCost — establish namespace cost baselines
- Set up scale-to-zero for dev namespaces after hours
Tier 3: Strategic (quarters)
- Migrate to Graviton instances for compute-heavy workloads
- Implement FinOps review process — monthly cost review with namespace owners
- Consolidate small clusters — fewer, larger clusters have lower control-plane overhead
- Evaluate GKE Autopilot or EKS auto mode if operational simplicity outweighs per-pod pricing
Frequently Asked Questions
How much can I realistically save?
Teams that have never done cost optimisation typically see 40–60% reduction from the combination of right-sizing requests and adding spot instances. The first two tiers (quick wins + spot) account for most of it. Graviton migration adds another 20–30% on top. Teams with mature cost practices can get to 70%+ below a naive baseline.
Is spot risky for production?
For stateless workloads with >=2 replicas and PDBs, spot interruptions are transparent — the pod is evicted, rescheduled on another node, and HPA maintains the replica count. The risk is real but manageable. Database primaries, anything with a long startup time (>5 minutes), and anything without PDBs should stay on on-demand.
VPA and HPA conflict — how do I handle this?
Use VPA in Off or Initial mode to set baseline resource requests. HPA handles horizontal scaling. The conflict only occurs when both are managing the same resource (e.g., VPA auto-adjusting CPU requests while HPA scales on CPU) — with VPA in Off mode, there's no conflict.
How do I allocate cost to shared platform components (Prometheus, Argo CD)?
Tag the monitoring and argocd namespaces with cost-center: platform and attribute them to the platform team. Kubecost lets you define shared cost allocation strategies — e.g., distribute 10% of monitoring cost to each tenant namespace proportionally by their resource usage.
What about EKS add-ons (CoreDNS, kube-proxy, VPC CNI)?
These are node-level costs already included in compute pricing. The main lever here is node count — fewer, larger nodes mean fewer CoreDNS pods, fewer kube-proxy DaemonSet pods, and lower overhead as a fraction of total cost.
For right-sizing with VPA and HPA integration, see Kubernetes Resource Requests and Limits: A Production Guide. For Karpenter setup, see How to Install Karpenter on EKS. For Graviton migration, see AWS Graviton ARM64 Migration Guide. For a deep dive on Karpenter NodePool configuration, consolidation, and drift detection, see Karpenter v1: Node Provisioning, Consolidation, and Drift.
Running a cost review for a Kubernetes cluster? Talk to us at Coding Protocols — we help platform teams identify and eliminate structural waste without compromising reliability.


