Kubernetes
14 min readMay 9, 2026

Kubernetes Capacity Planning: Sizing Clusters and Managing Resources

Kubernetes resource management is deceptively complex. Setting requests too low causes OOM kills and CPU throttling. Setting them too high wastes money. Not enforcing them at all means one team's noisy workload affects everyone else. Here's a systematic approach to sizing clusters, setting defaults, and measuring utilisation over time.

CO
Coding Protocols Team
Platform Engineering
Kubernetes Capacity Planning: Sizing Clusters and Managing Resources

Capacity planning in Kubernetes has two distinct problems that are often conflated. The first is cluster sizing: how many nodes, of what size, do you need? The second is resource management: how do you ensure workloads declare accurate resource requests so that scheduling decisions and autoscaling work correctly? Both are required — correct cluster sizing with inaccurate pod requests still results in overloaded nodes and unpredictable performance, and accurate pod requests without correct cluster sizing means pods can't be scheduled.

This guide covers both: the mechanics of how Kubernetes allocates resources, the tools for enforcing resource discipline across namespaces, and how to measure whether your sizing decisions are working.


Understanding Allocatable Resources

A node's total capacity is not fully available to workloads. Kubernetes reserves capacity for the OS, kubelet, and eviction thresholds:

Allocatable = Capacity - kube-reserved - system-reserved - eviction-threshold

For an EKS m5.xlarge (4 vCPU, 16 GiB):

CPU capacity:      4000m
- kube-reserved:    -80m  (EKS default: 80m CPU for kubelet)
- system-reserved:  -80m  (EKS default: 80m for OS)
CPU allocatable:   ≈3840m

Memory capacity:   16384 MiB
- kube-reserved:   -1024 MiB (EKS: scales with node memory)
- system-reserved:  -512 MiB
- eviction-threshold: -100 MiB (kubelet evicts pods below this watermark)
Memory allocatable: ≈14748 MiB (≈14.4 GiB)

Check actual allocatable for your nodes:

bash
1kubectl get nodes -o custom-columns=\
2  NAME:.metadata.name,\
3  CPU:.status.allocatable.cpu,\
4  MEMORY:.status.allocatable.memory,\
5  PODS:.status.allocatable.pods
6
7# NAME                   CPU      MEMORY         PODS
8# ip-10-0-1-123.ec2.internal   3920m    14745672Ki   58

The pods field shows the maximum pod count per node — this is set by --max-pods on the kubelet and defaults to instance-type-specific limits on EKS. With VPC CNI prefix delegation, you can raise this significantly.


Node Sizing Strategy

The fundamental trade-off in node sizing is blast radius versus efficiency:

ApproachAdvantagesDisadvantages
Fewer, larger nodesBetter bin-packing, fewer system pod overhead, faster pod startup (pre-warmed ENIs)Larger blast radius on failure, harder to drain for upgrades
Many, smaller nodesBetter failure isolation, easier drainingMore per-node overhead (DaemonSets, kube-proxy, kubelet), higher cost ratio

Practical guidance:

  • General workloads: 4-8 vCPU nodes (m5.xlarge to m5.2xlarge on AWS). Smaller than 4 vCPU means DaemonSet overhead is disproportionate. Larger than 8 vCPU is harder to bin-pack efficiently unless workloads are consistent sizes.
  • Memory-intensive workloads (caches, JVM apps): 16-32 GiB memory-optimised instances (r5 family). Right-size to your median pod memory request × 8-10 pods per node.
  • Spot instance pools: Use heterogeneous instance families (m5, m5a, m5n, m4) to increase availability — Karpenter does this automatically with NodePool instance family diversity.
  • System node group: Dedicate a small node group (2-3 nodes, On-Demand, small instance) for critical DaemonSets and system pods that can't tolerate spot interruption.

QoS Classes and Why They Matter

Kubernetes assigns a QoS class to every pod based on its resource configuration. When a node is under memory pressure, the kubelet uses QoS to determine which pods to evict first:

QoS ClassConditionEviction Priority
BestEffortNo requests or limits set for any containerFirst to evict
BurstableAt least one container has a request or limitEvicted after BestEffort
GuaranteedAll containers have requests == limits (for both CPU and memory)Last to evict
yaml
1# Guaranteed QoS: requests == limits for all containers
2resources:
3  requests:
4    cpu: 500m
5    memory: 512Mi
6  limits:
7    cpu: 500m    # Must match request for Guaranteed
8    memory: 512Mi
9
10# Burstable QoS: requests < limits
11resources:
12  requests:
13    cpu: 100m
14    memory: 128Mi
15  limits:
16    cpu: 500m
17    memory: 512Mi
18
19# BestEffort: no resources set (dangerous in production)
20resources: {}

CPU throttling vs OOM kill: CPU limits throttle the process (slows it down, doesn't kill it). Memory limits kill the container with OOM when exceeded. Setting CPU limits significantly HIGHER than requests (or omitting CPU limits entirely) reduces throttling. If CPU limit equals request, any burst above the request is immediately throttled by the kernel's CFS scheduler. Setting memory limits higher than requests (Burstable for memory) tolerates occasional spikes without OOM killing, while still setting an upper bound.


LimitRange: Enforcing Defaults

Without LimitRange, a pod that omits resources gets BestEffort QoS — it can consume unlimited CPU and memory, starving other workloads on the node. LimitRange sets defaults and constraints for all pods in a namespace:

yaml
1apiVersion: v1
2kind: LimitRange
3metadata:
4  name: default-resource-limits
5  namespace: production
6spec:
7  limits:
8    - type: Container
9      # Applied when container doesn't specify resources
10      default:
11        cpu: 200m
12        memory: 256Mi
13      defaultRequest:
14        cpu: 100m
15        memory: 128Mi
16      # Hard bounds — scheduler rejects containers that exceed these
17      max:
18        cpu: "8"
19        memory: 8Gi
20      min:
21        cpu: 50m
22        memory: 64Mi
23    - type: Pod
24      # Total pod resource bounds (sum of all containers)
25      max:
26        cpu: "16"
27        memory: 16Gi
28    - type: PersistentVolumeClaim
29      max:
30        storage: 100Gi
31      min:
32        storage: 1Gi

With this LimitRange, a container that omits resources gets 100m CPU / 128Mi memory as requests, and 200m CPU / 256Mi memory as limits — Burstable QoS, not BestEffort. This prevents the most common resource discipline failure.

bash
# Verify LimitRange is applied to new pods
kubectl run test --image=nginx -n production
kubectl get pod test -n production -o jsonpath='{.spec.containers[0].resources}'
# {"limits":{"cpu":"200m","memory":"256Mi"},"requests":{"cpu":"100m","memory":"128Mi"}}

ResourceQuota: Namespace-Level Caps

ResourceQuota limits the total resources a namespace can consume. Without quotas, a single team can exhaust cluster-wide capacity:

yaml
1apiVersion: v1
2kind: ResourceQuota
3metadata:
4  name: production-quota
5  namespace: production
6spec:
7  hard:
8    # Compute
9    requests.cpu: "20"
10    requests.memory: 40Gi
11    limits.cpu: "40"
12    limits.memory: 80Gi
13    # Objects
14    pods: "100"
15    services: "20"
16    services.loadbalancers: "2"
17    services.nodeports: "0"
18    # Storage
19    persistentvolumeclaims: "20"
20    requests.storage: 500Gi
21    # Secrets and ConfigMaps (prevent Secret sprawl)
22    secrets: "50"
23    configmaps: "50"

Enforcement: When a namespace hits its quota, new pods, services, or PVCs are rejected by the admission controller with a quota exceeded error. Applications need to handle this (retry, backoff) and platform teams need alerts for quota proximity:

bash
1# Check quota usage
2kubectl get resourcequota -n production
3# NAME                AGE   REQUEST                                          LIMIT
4# production-quota    5d    requests.cpu: 14/20, requests.memory: 28Gi/40Gi ...
5
6# Get utilisation percentage
7kubectl describe resourcequota production-quota -n production

Alert when a namespace reaches 80% of its quota:

yaml
1- alert: NamespaceQuotaApproachingLimit
2  expr: >
3    kube_resourcequota{type="used"} / kube_resourcequota{type="hard"} > 0.8
4  for: 30m
5  labels:
6    severity: warning
7  annotations:
8    summary: "Namespace {{ $labels.namespace }} at {{ $value | humanizePercentage }} of {{ $labels.resource }} quota"

VPA for Right-Sizing Recommendations

Vertical Pod Autoscaler in Off mode gives you right-sizing recommendations without actually changing anything — the safest way to use VPA for capacity planning:

yaml
1apiVersion: autoscaling.k8s.io/v1
2kind: VerticalPodAutoscaler
3metadata:
4  name: api-vpa
5  namespace: production
6spec:
7  targetRef:
8    apiVersion: apps/v1
9    kind: Deployment
10    name: api
11  updatePolicy:
12    updateMode: "Off"    # Recommendations only — no automatic updates
13  resourcePolicy:
14    containerPolicies:
15      - containerName: api
16        minAllowed:
17          cpu: 50m
18          memory: 64Mi
19        maxAllowed:
20          cpu: "4"
21          memory: 4Gi

Read the recommendations:

bash
1kubectl describe vpa api-vpa -n production
2# Recommendation:
3#   Container Recommendations:
4#     Container Name: api
5#     Lower Bound:
6#       cpu:     80m
7#       memory:  192Mi
8#     Target:                        ← Use this as your requests
9#       cpu:     250m
10#       memory:  512Mi
11#     Uncapped Target:
12#       cpu:     180m
13#       memory:  480Mi
14#     Upper Bound:                   ← Use this as your limits
15#       cpu:     1500m
16#       memory:  2Gi

The VPA Target is the recommended request based on observed usage. The Lower Bound / Upper Bound give a confidence interval — useful for setting limits. Use VPA recommendations to audit pods where requests are significantly over-provisioned (Target << current request) or under-provisioned (Target >> current request).


Measuring Cluster Utilisation

Target utilisation ranges vary by workload type, but a practical framework:

ResourceTarget UtilisationRationale
CPU requests / allocatable65–75%Headroom for HPA burst scaling
Memory requests / allocatable70–80%Memory doesn't compress; too-low headroom causes OOM
Pod count / max-pods70–80%Leave room for DaemonSets and system pods

The kube-capacity tool (installed via krew as kubectl resource-capacity) gives a quick cluster-wide view:

bash
1# Install via krew (installs as the "resource-capacity" plugin)
2kubectl krew install resource-capacity
3
4# Show per-node CPU and memory with pod count
5kubectl resource-capacity --pods --util --sort cpu.util
6
7# OUTPUT (approximate):
8# NODE                     CPU REQUESTS   CPU LIMITS   CPU UTIL   MEM REQUESTS   MEM LIMITS   MEM UTIL   PODS
9# ip-10-0-1-100.ec2.int    3100m/3840m    6200m/3840m  41%/3840m  11Gi/14.4Gi    22Gi/14.4Gi  62%        42/58
10# ip-10-0-1-200.ec2.int    950m/3840m     1900m/3840m  25%/3840m  3Gi/14.4Gi     6Gi/14.4Gi   21%        18/58

The CPU UTIL column (from metrics-server) versus CPU REQUESTS tells you whether your requests are accurate. Nodes where CPU util is 15% but CPU requests are 80% indicate significant over-provisioning — VPA recommendations will show this.

Prometheus queries for cluster-wide utilisation:

promql
1# CPU request efficiency (what fraction of requested CPU is actually used)
2sum(rate(container_cpu_usage_seconds_total[5m])) /
3sum(kube_pod_container_resource_requests{resource="cpu"})
4
5# Memory request efficiency
6sum(container_memory_working_set_bytes) /
7sum(kube_pod_container_resource_requests{resource="memory"})
8
9# Cluster CPU allocatable utilisation
10sum(kube_pod_container_resource_requests{resource="cpu"}) /
11sum(kube_node_status_allocatable{resource="cpu"})

Autoscaling Headroom Strategy

With cluster autoscaler or Karpenter, the cluster expands when pods can't be scheduled. But scale-out takes 2-5 minutes — during that window, new pods wait in Pending. For latency-sensitive workloads, this is unacceptable.

Strategy 1: Overprovisioning with a placeholder Deployment

yaml
1# Low-priority placeholder pods that consume capacity
2# Real pods preempt them immediately (higher priority)
3apiVersion: scheduling.k8s.io/v1
4kind: PriorityClass
5metadata:
6  name: overprovisioning
7value: -1   # Negative priority — preempted first
8preemptionPolicy: Never
9---
10apiVersion: apps/v1
11kind: Deployment
12metadata:
13  name: overprovisioning
14  namespace: kube-system
15spec:
16  replicas: 3   # Adjust based on desired headroom
17  template:
18    spec:
19      priorityClassName: overprovisioning
20      containers:
21        - name: pause
22          image: registry.k8s.io/pause:3.9
23          resources:
24            requests:
25              cpu: "1"
26              memory: 2Gi

Three placeholder pods consuming 3 vCPU / 6 GiB = roughly one node of headroom. When real pods are scheduled, they preempt the placeholder pods, which get evicted, triggering Karpenter to provision a new node. Scale-out completes while real pods are already running on the reclaimed headroom.

Strategy 2: Target CPU utilisation below 100% in HPA

Set HPA target CPU at 60-70% rather than 80-90%. This causes HPA to scale out earlier, before resource exhaustion, giving the cluster autoscaler time to provision nodes before pods queue.


Frequently Asked Questions

What's the right CPU request-to-limit ratio?

A common starting point: limits = 2-4× requests for CPU. This gives burstable capacity for spikes (JVM startup, request surges) while putting an upper bound on runaway processes. For memory, a tighter ratio (limits = 1.5-2× requests) is safer because memory exhaustion OOM-kills the process immediately, while CPU exhaustion "just" throttles it. Use VPA recommendations to calibrate based on observed p99 usage.

Should I set CPU limits at all?

This is debated. CPU limits in Kubernetes use CFS quota enforcement, which can cause cpu throttling even when the node has spare CPU — this is the CFS bandwidth control problem. Some teams remove CPU limits entirely and rely only on CPU requests for scheduling. Without limits, a misbehaving pod can monopolise node CPU. The pragmatic approach: set CPU limits at 3-4× CPU requests, and use VPA or monitoring to identify pods that hit their CPU limit frequently (those need their requests and limits raised).

How do I find pods without resource requests?

bash
1# Find pods without CPU requests
2kubectl get pods -A -o json | jq -r '
3  .items[] |
4  select(
5    .spec.containers[].resources.requests.cpu == null or
6    .spec.containers[].resources.requests.memory == null
7  ) |
8  [.metadata.namespace, .metadata.name] | @tsv'

Or use kube-score:

bash
kubectl get deployments -A -o yaml | kube-score score -
# [WARNING] Container Resource Requests and Limits
#   · api — CPU request is not set
#   · api — Memory limit is not set

For autoscaling beyond what resource tuning can handle, see KEDA: Event-Driven Autoscaling for Kubernetes. For cost optimisation strategies that complement capacity planning, see Kubernetes Cost Optimisation: Spot Instances, VPA, and Karpenter. For platform-wide enforcement of resource policies across teams, see Kubernetes Admission Webhooks: Validating and Mutating Workloads.

Running a multi-team Kubernetes platform and struggling with resource sprawl? Talk to us at Coding Protocols — we help platform teams implement resource governance that keeps clusters efficient without blocking development velocity.

Related Topics

Kubernetes
Capacity Planning
ResourceQuota
LimitRange
VPA
Platform Engineering
Cost Optimisation

Read Next