Kubernetes Resource Requests and Limits: The Complete Production Guide
Resource requests and limits control how Kubernetes schedules pods and what happens when nodes run out of capacity. Set them wrong and you get OOMKilled containers, CPU throttling that adds latency, or nodes that over-commit and evict your most important workloads. Here's how to get them right.

Resource requests and limits are two numbers on every container spec that directly control your cluster's stability, cost, and performance. Most teams either skip them entirely (causing over-commit and eviction cascades) or set them conservatively high (wasting 60–70% of provisioned compute).
Getting them right requires understanding what each number actually does — which is different from what most documentation implies.
Requests vs Limits: What They Actually Do
Requests
A resource request is what Kubernetes uses for scheduling. The scheduler finds a node with enough unallocated CPU and memory to satisfy all the requests of a pod. If no node has sufficient unallocated capacity, the pod stays Pending.
Requests do not cap usage. A container with cpu: "500m" request can use 2 CPUs if the node has spare capacity. The request is a scheduling hint and a QoS floor, not a ceiling.
Limits
A resource limit is what the kernel enforces at runtime. CPU limits are enforced via CFS (Completely Fair Scheduler) bandwidth control — the container's CPU usage is throttled if it exceeds the limit. Memory limits are enforced via cgroups — if a container exceeds its memory limit, the kernel OOMKills it.
The key difference:
- Exceeding CPU limit → throttled (slower, but still running)
- Exceeding memory limit → OOMKilled (container killed and restarted)
QoS Classes
Kubernetes assigns each pod one of three Quality of Service classes based on its resource configuration. QoS class determines eviction priority when a node is under memory pressure.
Guaranteed
Every container in the pod has equal requests and limits set for both CPU and memory:
1resources:
2 requests:
3 cpu: "500m"
4 memory: "512Mi"
5 limits:
6 cpu: "500m"
7 memory: "512Mi"Guaranteed pods are the last to be evicted. Use for production workloads that must not be interrupted.
Burstable
At least one container has requests set, and requests ≠ limits (or limits are not set for some containers):
1resources:
2 requests:
3 cpu: "250m"
4 memory: "256Mi"
5 limits:
6 cpu: "1"
7 memory: "1Gi"Burstable pods can use resources beyond their request when capacity is available. They're evicted after BestEffort pods but before Guaranteed pods.
BestEffort
No requests or limits set on any container:
# No resources block — BestEffort
containers:
- name: app
image: myapp:latestBestEffort pods are the first evicted under pressure. Never use this in production — if the node is under pressure, your pod is gone.
CPU: Throttling Is the Silent Performance Killer
CPU limits are frequently misunderstood. The common mental model: "set CPU limit to 1 CPU, container uses at most 1 CPU." The reality is more subtle.
CPU limits use CFS bandwidth control. A container with cpu: "1" gets 1 CPU of quota per 100ms scheduling period. If the container uses its 100ms quota in 40ms (burst), it's throttled for the remaining 60ms — even if other CPUs on the node are idle.
This throttling is invisible in most metrics. kubectl top pods shows CPU usage, not CPU throttling. A container that shows 500m CPU usage can be heavily throttled if it bursts periodically.
How to check CPU throttling:
# CPU throttling rate via Prometheus (requires node-exporter)
rate(container_cpu_cfs_throttled_seconds_total{container="my-app"}[5m]) /
rate(container_cpu_cfs_periods_total{container="my-app"}[5m])A throttling rate above 25% indicates your CPU limit is too low for your actual usage pattern, and your application is slower than it needs to be.
The Case Against CPU Limits
A significant portion of the Kubernetes community runs without CPU limits entirely, relying only on CPU requests for scheduling. The argument:
- CPU is a compressible resource — throttling degrades performance but doesn't crash the pod
- On a well-utilised cluster, idle CPU is wasted; limits prevent pods from using spare capacity
- Throttling latency is often worse than the risk of a noisy-neighbour CPU hog
This works if your cluster has good bin-packing (most of the time, nodes are well-utilised) and you monitor for CPU-hungry pods via requests. It doesn't work on clusters where a single runaway process can starve critical workloads.
Practical guidance: set CPU requests always. Consider CPU limits optional for application pods; set them for critical system pods where a runaway process must be contained.
Memory: Always Set Both Requests and Limits
Memory is incompressible — the kernel cannot reclaim memory from a container without killing it. For memory:
- Always set
requestsfor accurate scheduling - Always set
limitsto prevent one container from consuming all node memory and triggering evictions across the node - Set
requests == limitsforGuaranteedQoS on critical workloads
If a container's memory usage grows over time (a memory leak), the limit ensures the process is eventually OOMKilled rather than silently consuming node memory until the node becomes unstable.
OOMKilled: Diagnosis and Fix
When a container exceeds its memory limit, the kernel kills it. Kubernetes reports this as OOMKilled:
kubectl describe pod <pod> -n <namespace>
# Look for: Last State: Terminated, Reason: OOMKilledDiagnosis steps:
-
Check the memory limit:
bashkubectl get pod <pod> -o jsonpath='{.spec.containers[].resources.limits.memory}' -
Check actual memory usage over time (if you have Prometheus):
max_over_time(container_memory_working_set_bytes{container="my-app"}[1h]) -
Determine if the OOMKill was from a genuine memory leak, a legitimate spike (large request, batch job), or an under-sized limit.
Fixes by cause:
| Cause | Fix |
|---|---|
| Limit too low for normal usage | Increase the memory limit |
| Legitimate spike (batch processing) | Increase limit or split into smaller batches |
| Memory leak | Fix the application; limit is correctly catching it |
| JVM not respecting container limits | Add -XX:MaxRAMPercentage=75 to JVM flags |
JVM note: Java applications frequently OOMKill because the JVM defaults to using 25% of system RAM as heap, not container RAM. On a node with 64GB RAM, a JVM in a 2GB-limit container defaults to a 16GB heap and immediately exhausts the container limit. Always set explicit heap limits for JVM workloads:
env:
- name: JAVA_OPTS
value: "-XX:MaxRAMPercentage=75.0 -XX:InitialRAMPercentage=50.0"-XX:MaxRAMPercentage=75 sets the heap to 75% of the container's memory limit (not the node's RAM), leaving 25% for non-heap JVM overhead.
Setting the Right Values
Step 1: Measure Actual Usage
Run the application under realistic load and observe actual resource usage:
1# Current usage
2kubectl top pods -n production
3
4# Historical (if you have Prometheus)
5# P99 CPU usage over the last 7 days
6quantile_over_time(0.99, rate(container_cpu_usage_seconds_total{container="my-app"}[5m])[7d:5m])
7
8# P99 memory usage
9quantile_over_time(0.99, container_memory_working_set_bytes{container="my-app"}[7d:5m])Step 2: Set Requests at the P50–P75 of Normal Usage
Requests should reflect typical usage, not peak. Over-sizing requests wastes allocatable capacity and makes scheduling harder.
Step 3: Set Memory Limits at P99 + 20–30% Buffer
Enough buffer to absorb normal spikes without OOMKill; tight enough to catch genuine leaks.
Step 4: Set CPU Requests Accurately; Decide on CPU Limits Deliberately
If you set CPU limits, set them at P99 of CPU usage or higher. Below P99 causes throttling during normal load spikes.
LimitRange: Namespace-Level Defaults and Constraints
LimitRange objects set default requests/limits for containers in a namespace and enforce minimum/maximum values. This prevents BestEffort pods by ensuring every container gets resources even if the developer doesn't specify them.
1apiVersion: v1
2kind: LimitRange
3metadata:
4 name: default-limits
5 namespace: production
6spec:
7 limits:
8 - type: Container
9 default: # Applied when no limit is specified
10 cpu: "500m"
11 memory: "512Mi"
12 defaultRequest: # Applied when no request is specified
13 cpu: "100m"
14 memory: "128Mi"
15 max: # Hard ceiling — pods exceeding this are rejected
16 cpu: "4"
17 memory: "8Gi"
18 min: # Hard floor — pods below this are rejected
19 cpu: "50m"
20 memory: "64Mi"With this LimitRange, a pod with no resource spec gets 100m/128Mi requests and 500m/512Mi limits automatically — Burstable QoS by default.
Combine with a Kyverno policy that requires explicit resource settings for production workloads:
1# Kyverno: reject pods without explicit resource requests in production
2validate:
3 message: "CPU and memory requests are required."
4 pattern:
5 spec:
6 containers:
7 - resources:
8 requests:
9 memory: "?*"
10 cpu: "?*"ResourceQuota: Namespace-Level Budget
ResourceQuota caps the total resources a namespace can consume:
1apiVersion: v1
2kind: ResourceQuota
3metadata:
4 name: production-quota
5 namespace: production
6spec:
7 hard:
8 requests.cpu: "20"
9 requests.memory: 40Gi
10 limits.cpu: "40"
11 limits.memory: 80Gi
12 pods: "100"This prevents a single namespace from consuming all cluster capacity. When the quota is reached, new pods are rejected with a clear error. Size quotas based on team allocation, not guesswork — measure actual usage first.
Vertical Pod Autoscaler (VPA) for Right-Sizing
VPA observes actual resource usage and recommends (or automatically applies) adjusted requests and limits. It's useful for right-sizing workloads where you don't know the correct values upfront.
Run VPA in Off mode to get recommendations without automatic changes:
1apiVersion: autoscaling.k8s.io/v1
2kind: VerticalPodAutoscaler
3metadata:
4 name: api-vpa
5 namespace: production
6spec:
7 targetRef:
8 apiVersion: apps/v1
9 kind: Deployment
10 name: api
11 updatePolicy:
12 updateMode: "Off" # Recommend only, don't changeCheck recommendations:
kubectl describe vpa api-vpa -n production
# Look for: Recommendation section with Lower Bound, Target, Upper BoundVPA's Target recommendation is a reasonable starting point for requests. Apply it, observe for a week, adjust if needed.
Warning: do not run VPA in Auto mode alongside HPA on the same resource metric. VPA adjusting requests causes HPA to recalculate desired replicas, causing thrashing. Use VPA in Off or Initial mode for production, and only apply recommendations manually after review.
Frequently Asked Questions
What happens when a node runs out of memory?
The kubelet begins evicting pods in order: BestEffort first, then Burstable (starting with those furthest over their requests), then Guaranteed last. Evicted pods are rescheduled on other nodes. If all nodes are under pressure, pods stay Pending until capacity frees up or new nodes are provisioned.
Should I set the same values for requests and limits?
For memory on production workloads: yes, set them equal for Guaranteed QoS. For CPU: equal values mean the container is always throttled if it tries to burst — often not what you want. A common pattern: equal memory requests and limits, CPU limit set 2–4x the CPU request.
Why does kubectl top show lower memory than my limit but the container still OOMKills?
kubectl top shows container_memory_working_set_bytes, which excludes file-backed pages that can be reclaimed. The OOMKill triggers on container_memory_usage_bytes, which includes those pages. Your container may be using more memory than kubectl top shows. Use container_memory_rss in Prometheus for a better picture of actual heap usage.
Is there a rule of thumb for setting initial values?
For a typical Go or Node.js service with no prior data: start with 100m CPU request, 128Mi memory request, 512Mi memory limit. For JVM services: 250m CPU request, 512Mi memory request, 1Gi memory limit. Observe actual usage for one week, then right-size based on measured P95 values. These are starting points, not targets.
For autoscaling on top of resource configuration, see Kubernetes HPA Beyond CPU. For enforcing resource requirements cluster-wide, see Kubernetes RBAC in Practice (Kyverno patterns) and Supply Chain Security Tools for Kubernetes. For Goldilocks and VPA tooling for right-sizing workloads across namespaces, see Kubernetes Cost Optimization: FinOps Patterns for EKS at Scale.
Trying to right-size resource configuration across a large cluster? Talk to us at Coding Protocols — we help platform teams implement resource governance that reduces waste without sacrificing reliability.


