Kubernetes Capacity Planning: Sizing Clusters and Managing Resources
Kubernetes resource management is deceptively complex. Setting requests too low causes OOM kills and CPU throttling. Setting them too high wastes money. Not enforcing them at all means one team's noisy workload affects everyone else. Here's a systematic approach to sizing clusters, setting defaults, and measuring utilisation over time.

Capacity planning in Kubernetes has two distinct problems that are often conflated. The first is cluster sizing: how many nodes, of what size, do you need? The second is resource management: how do you ensure workloads declare accurate resource requests so that scheduling decisions and autoscaling work correctly? Both are required — correct cluster sizing with inaccurate pod requests still results in overloaded nodes and unpredictable performance, and accurate pod requests without correct cluster sizing means pods can't be scheduled.
This guide covers both: the mechanics of how Kubernetes allocates resources, the tools for enforcing resource discipline across namespaces, and how to measure whether your sizing decisions are working.
Understanding Allocatable Resources
A node's total capacity is not fully available to workloads. Kubernetes reserves capacity for the OS, kubelet, and eviction thresholds:
Allocatable = Capacity - kube-reserved - system-reserved - eviction-threshold
For an EKS m5.xlarge (4 vCPU, 16 GiB):
CPU capacity: 4000m
- kube-reserved: -80m (EKS default: 80m CPU for kubelet)
- system-reserved: -80m (EKS default: 80m for OS)
CPU allocatable: ≈3840m
Memory capacity: 16384 MiB
- kube-reserved: -1024 MiB (EKS: scales with node memory)
- system-reserved: -512 MiB
- eviction-threshold: -100 MiB (kubelet evicts pods below this watermark)
Memory allocatable: ≈14748 MiB (≈14.4 GiB)
Check actual allocatable for your nodes:
1kubectl get nodes -o custom-columns=\
2 NAME:.metadata.name,\
3 CPU:.status.allocatable.cpu,\
4 MEMORY:.status.allocatable.memory,\
5 PODS:.status.allocatable.pods
6
7# NAME CPU MEMORY PODS
8# ip-10-0-1-123.ec2.internal 3920m 14745672Ki 58The pods field shows the maximum pod count per node — this is set by --max-pods on the kubelet and defaults to instance-type-specific limits on EKS. With VPC CNI prefix delegation, you can raise this significantly.
Node Sizing Strategy
The fundamental trade-off in node sizing is blast radius versus efficiency:
| Approach | Advantages | Disadvantages |
|---|---|---|
| Fewer, larger nodes | Better bin-packing, fewer system pod overhead, faster pod startup (pre-warmed ENIs) | Larger blast radius on failure, harder to drain for upgrades |
| Many, smaller nodes | Better failure isolation, easier draining | More per-node overhead (DaemonSets, kube-proxy, kubelet), higher cost ratio |
Practical guidance:
- General workloads: 4-8 vCPU nodes (m5.xlarge to m5.2xlarge on AWS). Smaller than 4 vCPU means DaemonSet overhead is disproportionate. Larger than 8 vCPU is harder to bin-pack efficiently unless workloads are consistent sizes.
- Memory-intensive workloads (caches, JVM apps): 16-32 GiB memory-optimised instances (r5 family). Right-size to your median pod memory request × 8-10 pods per node.
- Spot instance pools: Use heterogeneous instance families (m5, m5a, m5n, m4) to increase availability — Karpenter does this automatically with
NodePoolinstance family diversity. - System node group: Dedicate a small node group (2-3 nodes, On-Demand, small instance) for critical DaemonSets and system pods that can't tolerate spot interruption.
QoS Classes and Why They Matter
Kubernetes assigns a QoS class to every pod based on its resource configuration. When a node is under memory pressure, the kubelet uses QoS to determine which pods to evict first:
| QoS Class | Condition | Eviction Priority |
|---|---|---|
BestEffort | No requests or limits set for any container | First to evict |
Burstable | At least one container has a request or limit | Evicted after BestEffort |
Guaranteed | All containers have requests == limits (for both CPU and memory) | Last to evict |
1# Guaranteed QoS: requests == limits for all containers
2resources:
3 requests:
4 cpu: 500m
5 memory: 512Mi
6 limits:
7 cpu: 500m # Must match request for Guaranteed
8 memory: 512Mi
9
10# Burstable QoS: requests < limits
11resources:
12 requests:
13 cpu: 100m
14 memory: 128Mi
15 limits:
16 cpu: 500m
17 memory: 512Mi
18
19# BestEffort: no resources set (dangerous in production)
20resources: {}CPU throttling vs OOM kill: CPU limits throttle the process (slows it down, doesn't kill it). Memory limits kill the container with OOM when exceeded. Setting CPU limits significantly HIGHER than requests (or omitting CPU limits entirely) reduces throttling. If CPU limit equals request, any burst above the request is immediately throttled by the kernel's CFS scheduler. Setting memory limits higher than requests (Burstable for memory) tolerates occasional spikes without OOM killing, while still setting an upper bound.
LimitRange: Enforcing Defaults
Without LimitRange, a pod that omits resources gets BestEffort QoS — it can consume unlimited CPU and memory, starving other workloads on the node. LimitRange sets defaults and constraints for all pods in a namespace:
1apiVersion: v1
2kind: LimitRange
3metadata:
4 name: default-resource-limits
5 namespace: production
6spec:
7 limits:
8 - type: Container
9 # Applied when container doesn't specify resources
10 default:
11 cpu: 200m
12 memory: 256Mi
13 defaultRequest:
14 cpu: 100m
15 memory: 128Mi
16 # Hard bounds — scheduler rejects containers that exceed these
17 max:
18 cpu: "8"
19 memory: 8Gi
20 min:
21 cpu: 50m
22 memory: 64Mi
23 - type: Pod
24 # Total pod resource bounds (sum of all containers)
25 max:
26 cpu: "16"
27 memory: 16Gi
28 - type: PersistentVolumeClaim
29 max:
30 storage: 100Gi
31 min:
32 storage: 1GiWith this LimitRange, a container that omits resources gets 100m CPU / 128Mi memory as requests, and 200m CPU / 256Mi memory as limits — Burstable QoS, not BestEffort. This prevents the most common resource discipline failure.
# Verify LimitRange is applied to new pods
kubectl run test --image=nginx -n production
kubectl get pod test -n production -o jsonpath='{.spec.containers[0].resources}'
# {"limits":{"cpu":"200m","memory":"256Mi"},"requests":{"cpu":"100m","memory":"128Mi"}}ResourceQuota: Namespace-Level Caps
ResourceQuota limits the total resources a namespace can consume. Without quotas, a single team can exhaust cluster-wide capacity:
1apiVersion: v1
2kind: ResourceQuota
3metadata:
4 name: production-quota
5 namespace: production
6spec:
7 hard:
8 # Compute
9 requests.cpu: "20"
10 requests.memory: 40Gi
11 limits.cpu: "40"
12 limits.memory: 80Gi
13 # Objects
14 pods: "100"
15 services: "20"
16 services.loadbalancers: "2"
17 services.nodeports: "0"
18 # Storage
19 persistentvolumeclaims: "20"
20 requests.storage: 500Gi
21 # Secrets and ConfigMaps (prevent Secret sprawl)
22 secrets: "50"
23 configmaps: "50"Enforcement: When a namespace hits its quota, new pods, services, or PVCs are rejected by the admission controller with a quota exceeded error. Applications need to handle this (retry, backoff) and platform teams need alerts for quota proximity:
1# Check quota usage
2kubectl get resourcequota -n production
3# NAME AGE REQUEST LIMIT
4# production-quota 5d requests.cpu: 14/20, requests.memory: 28Gi/40Gi ...
5
6# Get utilisation percentage
7kubectl describe resourcequota production-quota -n productionAlert when a namespace reaches 80% of its quota:
1- alert: NamespaceQuotaApproachingLimit
2 expr: >
3 kube_resourcequota{type="used"} / kube_resourcequota{type="hard"} > 0.8
4 for: 30m
5 labels:
6 severity: warning
7 annotations:
8 summary: "Namespace {{ $labels.namespace }} at {{ $value | humanizePercentage }} of {{ $labels.resource }} quota"VPA for Right-Sizing Recommendations
Vertical Pod Autoscaler in Off mode gives you right-sizing recommendations without actually changing anything — the safest way to use VPA for capacity planning:
1apiVersion: autoscaling.k8s.io/v1
2kind: VerticalPodAutoscaler
3metadata:
4 name: api-vpa
5 namespace: production
6spec:
7 targetRef:
8 apiVersion: apps/v1
9 kind: Deployment
10 name: api
11 updatePolicy:
12 updateMode: "Off" # Recommendations only — no automatic updates
13 resourcePolicy:
14 containerPolicies:
15 - containerName: api
16 minAllowed:
17 cpu: 50m
18 memory: 64Mi
19 maxAllowed:
20 cpu: "4"
21 memory: 4GiRead the recommendations:
1kubectl describe vpa api-vpa -n production
2# Recommendation:
3# Container Recommendations:
4# Container Name: api
5# Lower Bound:
6# cpu: 80m
7# memory: 192Mi
8# Target: ← Use this as your requests
9# cpu: 250m
10# memory: 512Mi
11# Uncapped Target:
12# cpu: 180m
13# memory: 480Mi
14# Upper Bound: ← Use this as your limits
15# cpu: 1500m
16# memory: 2GiThe VPA Target is the recommended request based on observed usage. The Lower Bound / Upper Bound give a confidence interval — useful for setting limits. Use VPA recommendations to audit pods where requests are significantly over-provisioned (Target << current request) or under-provisioned (Target >> current request).
Measuring Cluster Utilisation
Target utilisation ranges vary by workload type, but a practical framework:
| Resource | Target Utilisation | Rationale |
|---|---|---|
| CPU requests / allocatable | 65–75% | Headroom for HPA burst scaling |
| Memory requests / allocatable | 70–80% | Memory doesn't compress; too-low headroom causes OOM |
| Pod count / max-pods | 70–80% | Leave room for DaemonSets and system pods |
The kube-capacity tool (installed via krew as kubectl resource-capacity) gives a quick cluster-wide view:
1# Install via krew (installs as the "resource-capacity" plugin)
2kubectl krew install resource-capacity
3
4# Show per-node CPU and memory with pod count
5kubectl resource-capacity --pods --util --sort cpu.util
6
7# OUTPUT (approximate):
8# NODE CPU REQUESTS CPU LIMITS CPU UTIL MEM REQUESTS MEM LIMITS MEM UTIL PODS
9# ip-10-0-1-100.ec2.int 3100m/3840m 6200m/3840m 41%/3840m 11Gi/14.4Gi 22Gi/14.4Gi 62% 42/58
10# ip-10-0-1-200.ec2.int 950m/3840m 1900m/3840m 25%/3840m 3Gi/14.4Gi 6Gi/14.4Gi 21% 18/58The CPU UTIL column (from metrics-server) versus CPU REQUESTS tells you whether your requests are accurate. Nodes where CPU util is 15% but CPU requests are 80% indicate significant over-provisioning — VPA recommendations will show this.
Prometheus queries for cluster-wide utilisation:
1# CPU request efficiency (what fraction of requested CPU is actually used)
2sum(rate(container_cpu_usage_seconds_total[5m])) /
3sum(kube_pod_container_resource_requests{resource="cpu"})
4
5# Memory request efficiency
6sum(container_memory_working_set_bytes) /
7sum(kube_pod_container_resource_requests{resource="memory"})
8
9# Cluster CPU allocatable utilisation
10sum(kube_pod_container_resource_requests{resource="cpu"}) /
11sum(kube_node_status_allocatable{resource="cpu"})Autoscaling Headroom Strategy
With cluster autoscaler or Karpenter, the cluster expands when pods can't be scheduled. But scale-out takes 2-5 minutes — during that window, new pods wait in Pending. For latency-sensitive workloads, this is unacceptable.
Strategy 1: Overprovisioning with a placeholder Deployment
1# Low-priority placeholder pods that consume capacity
2# Real pods preempt them immediately (higher priority)
3apiVersion: scheduling.k8s.io/v1
4kind: PriorityClass
5metadata:
6 name: overprovisioning
7value: -1 # Negative priority — preempted first
8preemptionPolicy: Never
9---
10apiVersion: apps/v1
11kind: Deployment
12metadata:
13 name: overprovisioning
14 namespace: kube-system
15spec:
16 replicas: 3 # Adjust based on desired headroom
17 template:
18 spec:
19 priorityClassName: overprovisioning
20 containers:
21 - name: pause
22 image: registry.k8s.io/pause:3.9
23 resources:
24 requests:
25 cpu: "1"
26 memory: 2GiThree placeholder pods consuming 3 vCPU / 6 GiB = roughly one node of headroom. When real pods are scheduled, they preempt the placeholder pods, which get evicted, triggering Karpenter to provision a new node. Scale-out completes while real pods are already running on the reclaimed headroom.
Strategy 2: Target CPU utilisation below 100% in HPA
Set HPA target CPU at 60-70% rather than 80-90%. This causes HPA to scale out earlier, before resource exhaustion, giving the cluster autoscaler time to provision nodes before pods queue.
Frequently Asked Questions
What's the right CPU request-to-limit ratio?
A common starting point: limits = 2-4× requests for CPU. This gives burstable capacity for spikes (JVM startup, request surges) while putting an upper bound on runaway processes. For memory, a tighter ratio (limits = 1.5-2× requests) is safer because memory exhaustion OOM-kills the process immediately, while CPU exhaustion "just" throttles it. Use VPA recommendations to calibrate based on observed p99 usage.
Should I set CPU limits at all?
This is debated. CPU limits in Kubernetes use CFS quota enforcement, which can cause cpu throttling even when the node has spare CPU — this is the CFS bandwidth control problem. Some teams remove CPU limits entirely and rely only on CPU requests for scheduling. Without limits, a misbehaving pod can monopolise node CPU. The pragmatic approach: set CPU limits at 3-4× CPU requests, and use VPA or monitoring to identify pods that hit their CPU limit frequently (those need their requests and limits raised).
How do I find pods without resource requests?
1# Find pods without CPU requests
2kubectl get pods -A -o json | jq -r '
3 .items[] |
4 select(
5 .spec.containers[].resources.requests.cpu == null or
6 .spec.containers[].resources.requests.memory == null
7 ) |
8 [.metadata.namespace, .metadata.name] | @tsv'Or use kube-score:
kubectl get deployments -A -o yaml | kube-score score -
# [WARNING] Container Resource Requests and Limits
# · api — CPU request is not set
# · api — Memory limit is not setFor autoscaling beyond what resource tuning can handle, see KEDA: Event-Driven Autoscaling for Kubernetes. For cost optimisation strategies that complement capacity planning, see Kubernetes Cost Optimisation: Spot Instances, VPA, and Karpenter. For platform-wide enforcement of resource policies across teams, see Kubernetes Admission Webhooks: Validating and Mutating Workloads.
Running a multi-team Kubernetes platform and struggling with resource sprawl? Talk to us at Coding Protocols — we help platform teams implement resource governance that keeps clusters efficient without blocking development velocity.


