Kubernetes HPA v2: Behavior Tuning and ContainerResource Metrics
The Horizontal Pod Autoscaler on CPU utilisation is just the starting point. HPA v2 (stable since Kubernetes 1.23) supports custom Prometheus metrics, external metrics from queues and managed services, container-level resource metrics, and fine-grained scaling behavior configuration — letting you scale on queue depth, request rate, memory, or any metric your workload exposes.

The Horizontal Pod Autoscaler graduated to autoscaling/v2 (stable) in Kubernetes 1.23, bringing custom and external metric support into the stable API. The original HPA on CPU utilisation has a fundamental problem: CPU is a proxy for load, not load itself. A CPU-bound application scales correctly on CPU. An I/O-bound application (waiting on database queries, calling external APIs) may be at 10% CPU while completely overloaded on concurrent connections.
HPA v2 lets you scale on what actually matters: request rate, queue depth, response latency, active connections, or any Prometheus metric your application exposes.
HPA v2 Metric Types
1spec:
2 metrics:
3 - type: Resource # CPU or memory (built-in, via metrics-server)
4 - type: ContainerResource # CPU/memory for a specific container in a pod
5 - type: Pods # Per-pod custom metric (averaged across pods)
6 - type: Object # Metric from a specific Kubernetes object
7 - type: External # Metric from an external system (SQS queue depth, etc.)The HPA algorithm:
desiredReplicas = ceil(currentReplicas × (currentMetricValue / targetValue))
For example: 5 pods at 40 RPS each = 200 RPS total. Target is 50 RPS per pod. Desired = ceil(5 × (40/50)) = ceil(4.0) = 4. HPA scales down to 4 replicas.
Basic CPU/Memory HPA
1apiVersion: autoscaling/v2
2kind: HorizontalPodAutoscaler
3metadata:
4 name: payments-api
5 namespace: production
6spec:
7 scaleTargetRef:
8 apiVersion: apps/v1
9 kind: Deployment
10 name: payments-api
11 minReplicas: 3
12 maxReplicas: 20
13 metrics:
14 - type: Resource
15 resource:
16 name: cpu
17 target:
18 type: Utilization
19 averageUtilization: 70 # Scale when avg CPU > 70% of request
20 - type: Resource
21 resource:
22 name: memory
23 target:
24 type: AverageValue
25 averageValue: 400Mi # Scale when avg memory > 400Mi per podImportant: CPU utilisation is calculated as a percentage of the pod's CPU request. A pod with requests.cpu: 100m using 80m is at 80% utilisation. Without CPU requests, HPA can't calculate utilisation — set CPU requests on all pods you want HPA to manage.
Scaling Behavior Configuration
Without behavior configuration, HPA scales up aggressively (as fast as possible) and scales down slowly (5% per minute with a default stabilisation window). behavior lets you tune both directions independently:
1spec:
2 behavior:
3 scaleUp:
4 stabilizationWindowSeconds: 60 # Don't scale up if metric was high < 60s ago
5 # Prevent thrashing during sudden traffic spikes
6 selectPolicy: Max # Use the policy that allows the most pods
7 policies:
8 - type: Pods
9 value: 4 # Add at most 4 pods per 60-second window
10 periodSeconds: 60
11 - type: Percent
12 value: 100 # Or double the current pod count
13 periodSeconds: 60
14
15 scaleDown:
16 stabilizationWindowSeconds: 300 # Wait 5 minutes before scaling down
17 selectPolicy: Min # Use the policy that removes the fewest pods
18 policies:
19 - type: Pods
20 value: 2 # Remove at most 2 pods per 5-minute window
21 periodSeconds: 300
22 - type: Percent
23 value: 10 # Or remove at most 10% of pods
24 periodSeconds: 300selectPolicy: Max for scaleUp means HPA picks whichever policy allows scaling to more pods — useful to ensure fast scale-up during traffic spikes.
selectPolicy: Min for scaleDown means HPA picks the policy that removes the fewest pods — conservative scaling down to avoid oscillation.
stabilizationWindowSeconds prevents scaling decisions based on transient metric spikes. A 5-minute scaleDown stabilisation window means the metric must stay below the threshold for 5 minutes before HPA actually removes pods — prevents premature scale-down after a short traffic lull.
Custom Metrics via prometheus-adapter
prometheus-adapter bridges Prometheus and the Kubernetes custom metrics API (custom.metrics.k8s.io). HPA queries this API for Pods or Object type metrics.
Install prometheus-adapter
helm install prometheus-adapter prometheus-community/prometheus-adapter \
--namespace monitoring \
--values adapter-values.yaml1# adapter-values.yaml
2prometheus:
3 url: http://kube-prometheus-stack-prometheus.monitoring.svc
4 port: 9090
5
6rules:
7 default: false # Don't auto-generate rules; define explicitly
8
9 custom:
10 # requests_per_second: per-pod HTTP request rate from Prometheus
11 - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
12 resources:
13 overrides:
14 namespace:
15 resource: namespace
16 pod:
17 resource: pod
18 name:
19 matches: "^(.*)_total$"
20 as: "${1}_per_second" # Expose as http_requests_per_second
21 metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
22 # <<.Series>>, <<.LabelMatchers>>, <<.GroupBy>> are template variables
23
24 # active_connections: per-pod gauge metric
25 - seriesQuery: 'nginx_active_connections{namespace!="",pod!=""}'
26 resources:
27 overrides:
28 namespace:
29 resource: namespace
30 pod:
31 resource: pod
32 name:
33 as: "nginx_active_connections"
34 metricsQuery: 'avg(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)'Verify the custom metric is available:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta2/namespaces/production/pods/*/http_requests_per_second"
# {"kind":"MetricValueList","apiVersion":"custom.metrics.k8s.io/v1beta2","metadata":{},"items":[{"describedObject":{"kind":"Pod","namespace":"production","name":"payments-api-xxxxx","apiVersion":"/v1"},"metricName":"http_requests_per_second","timestamp":"...","value":"250m","selector":null}]}HPA Using Custom Metric
1apiVersion: autoscaling/v2
2kind: HorizontalPodAutoscaler
3metadata:
4 name: payments-api
5 namespace: production
6spec:
7 scaleTargetRef:
8 apiVersion: apps/v1
9 kind: Deployment
10 name: payments-api
11 minReplicas: 3
12 maxReplicas: 30
13 metrics:
14 - type: Pods
15 pods:
16 metric:
17 name: http_requests_per_second # Must match the adapter's metricsQuery output
18 target:
19 type: AverageValue
20 averageValue: 100 # Scale when avg > 100 RPS per pod
21 behavior:
22 scaleDown:
23 stabilizationWindowSeconds: 300External Metrics (SQS Queue Depth)
External metrics come from outside the cluster. The prometheus-adapter can expose these if Prometheus scrapes them, but for native AWS metrics, use KEDA's SQS trigger instead. For Prometheus-scraped external metrics:
# In prometheus-adapter config:
external:
- seriesQuery: 'aws_sqs_approximate_number_of_messages_visible{queue_name="payments-orders"}'
name:
as: "sqs_queue_depth_payments"
metricsQuery: 'max(<<.Series>>)'1# HPA using external metric
2spec:
3 metrics:
4 - type: External
5 external:
6 metric:
7 name: sqs_queue_depth_payments
8 target:
9 type: AverageValue
10 averageValue: "5" # 5 messages per pod (scales to 100 pods if queue has 500 msgs)For queue-based scaling without custom metrics adapter configuration, KEDA is significantly simpler. See KEDA: Event-Driven Autoscaling for the KEDA approach.
ContainerResource Metrics
When a pod has multiple containers with different resource profiles (app container + sidecar), scaling on the total pod CPU average distorts the signal. ContainerResource scales on a specific container's resource usage:
1spec:
2 metrics:
3 - type: ContainerResource
4 containerResource:
5 name: cpu
6 container: app # Scale based on the app container's CPU, not the sidecar's
7 target:
8 type: Utilization
9 averageUtilization: 70This is particularly useful for Istio-sidecar-injected pods, where the Envoy proxy's CPU usage is unrelated to the application's load and shouldn't influence scaling decisions.
HPA vs KEDA
| Aspect | HPA v2 | KEDA |
|---|---|---|
| Built-in | Yes (in cluster since 1.6) | Requires separate install |
| Scale to zero | No (minReplicas ≥ 1) | Yes |
| Metric sources | Metrics APIs (custom/external via adapter) | 60+ native scalers (SQS, Kafka, Redis, Cron, etc.) |
| Multi-metric scaling | Yes | Yes (multiple triggers) |
| Configuration complexity | Medium (prometheus-adapter required for custom) | Low for common sources |
| Compatibility | Works with any HPA controller | Installs alongside HPA, uses same API |
KEDA and HPA can coexist in the same cluster — KEDA creates HPA objects under the hood using autoscaling/v2 with external metric triggers. For scale-to-zero use cases (batch jobs, development environments) or native integrations with AWS/Azure/GCP managed services, KEDA is the better choice. For straightforward CPU/memory scaling or simple Prometheus metric scaling, native HPA v2 with prometheus-adapter avoids an additional component.
Frequently Asked Questions
My HPA shows unknown for metrics — what's wrong?
1kubectl describe hpa payments-api -n production
2# Metrics: (unknown/70%)
3
4# Check if metrics-server is running
5kubectl top pods -n production
6
7# Check if custom metrics API is available
8kubectl get --raw /apis/custom.metrics.k8s.io/v1beta2
9# Should return an API discovery response, not an errorFor unknown on CPU: metrics-server is not running or pod doesn't have CPU requests. For unknown on custom metrics: prometheus-adapter is not running, or the metric series query doesn't match any Prometheus data (check with kubectl get --raw ... as shown above).
Can I use HPA with Argo Rollouts?
Yes. Argo Rollouts exposes its own HPA-compatible target (argo-rollouts.argoproj.io/rollout). Set scaleTargetRef.apiVersion: argoproj.io/v1alpha1 and scaleTargetRef.kind: Rollout — HPA scales the rollout's desired replica count, and Argo Rollouts distributes those replicas across canary and stable ReplicaSets according to the rollout weight.
What's the interaction between HPA and VPA?
HPA and VPA can coexist on the same workload if they manage different metrics:
- HPA on CPU request utilisation + VPA on memory: conflict (both try to decide on CPU resource requests/replicas)
- HPA on custom metrics (request rate) + VPA in Auto mode: can work, but VPA pod restarts during scaling events can interfere with HPA's metric accuracy
The safest combination: HPA on custom application metrics (not CPU/memory) + VPA in Off mode (recommendations only). For scale-out decisions, use HPA. For right-sizing requests, use VPA recommendations applied during maintenance windows.
For event-driven scaling with scale-to-zero and native queue/event integrations, see KEDA: Event-Driven Autoscaling for Kubernetes. For VPA right-sizing that complements HPA scaling, see Kubernetes Capacity Planning. For HPA signal selection, KEDA comparison, and scaling on queue depth and latency, see Kubernetes HPA Beyond CPU: Scaling on Custom and External Metrics. For correct resource requests and limits that HPA depends on to compute utilization accurately, see Kubernetes Resource Management: Requests, Limits, QoS, and LimitRanges. For node-level scaling that responds to HPA-driven pod demand, see Kubernetes Node Autoscaling: Cluster Autoscaler vs Karpenter.
Tuning autoscaling for a latency-sensitive production workload? Talk to us at Coding Protocols — we help platform teams configure HPA behavior and custom metrics that match their workload's actual scaling signal.


