Kubernetes HPA v2: Behavior Tuning and ContainerResource Metrics (2026)

The Horizontal Pod Autoscaler graduated to autoscaling/v2 (stable) in Kubernetes 1.23, bringing custom and external metric support into the stable API. The original HPA on CPU utilisation has a fundamental problem: CPU is a proxy for load, not load itself. A CPU-bound application scales correctly on CPU. An I/O-bound application (waiting on database queries, calling external APIs) may be at 10% CPU while completely overloaded on concurrent connections.

HPA v2 lets you scale on what actually matters: request rate, queue depth, response latency, active connections, or any Prometheus metric your application exposes.

HPA v2 Metric Types

yaml

1spec:
2  metrics:
3    - type: Resource           # CPU or memory (built-in, via metrics-server)
4    - type: ContainerResource  # CPU/memory for a specific container in a pod
5    - type: Pods               # Per-pod custom metric (averaged across pods)
6    - type: Object             # Metric from a specific Kubernetes object
7    - type: External           # Metric from an external system (SQS queue depth, etc.)

The HPA algorithm:

desiredReplicas = ceil(currentReplicas × (currentMetricValue / targetValue))

For example: 5 pods at 40 RPS each = 200 RPS total. Target is 50 RPS per pod. Desired = ceil(5 × (40/50)) = ceil(4.0) = 4. HPA scales down to 4 replicas.

Basic CPU/Memory HPA

yaml

1apiVersion: autoscaling/v2
2kind: HorizontalPodAutoscaler
3metadata:
4  name: payments-api
5  namespace: production
6spec:
7  scaleTargetRef:
8    apiVersion: apps/v1
9    kind: Deployment
10    name: payments-api
11  minReplicas: 3
12  maxReplicas: 20
13  metrics:
14    - type: Resource
15      resource:
16        name: cpu
17        target:
18          type: Utilization
19          averageUtilization: 70    # Scale when avg CPU > 70% of request
20    - type: Resource
21      resource:
22        name: memory
23        target:
24          type: AverageValue
25          averageValue: 400Mi       # Scale when avg memory > 400Mi per pod

Important: CPU utilisation is calculated as a percentage of the pod's CPU request. A pod with requests.cpu: 100m using 80m is at 80% utilisation. Without CPU requests, HPA can't calculate utilisation — set CPU requests on all pods you want HPA to manage.

Scaling Behavior Configuration

Without behavior configuration, HPA scales up aggressively (as fast as possible) and scales down slowly (5% per minute with a default stabilisation window). behavior lets you tune both directions independently:

yaml

1spec:
2  behavior:
3    scaleUp:
4      stabilizationWindowSeconds: 60     # Don't scale up if metric was high < 60s ago
5      # Prevent thrashing during sudden traffic spikes
6      selectPolicy: Max                  # Use the policy that allows the most pods
7      policies:
8        - type: Pods
9          value: 4                       # Add at most 4 pods per 60-second window
10          periodSeconds: 60
11        - type: Percent
12          value: 100                     # Or double the current pod count
13          periodSeconds: 60
14
15    scaleDown:
16      stabilizationWindowSeconds: 300    # Wait 5 minutes before scaling down
17      selectPolicy: Min                  # Use the policy that removes the fewest pods
18      policies:
19        - type: Pods
20          value: 2                       # Remove at most 2 pods per 5-minute window
21          periodSeconds: 300
22        - type: Percent
23          value: 10                      # Or remove at most 10% of pods
24          periodSeconds: 300

selectPolicy: Max for scaleUp means HPA picks whichever policy allows scaling to more pods — useful to ensure fast scale-up during traffic spikes.

selectPolicy: Min for scaleDown means HPA picks the policy that removes the fewest pods — conservative scaling down to avoid oscillation.

stabilizationWindowSeconds prevents scaling decisions based on transient metric spikes. A 5-minute scaleDown stabilisation window means the metric must stay below the threshold for 5 minutes before HPA actually removes pods — prevents premature scale-down after a short traffic lull.

Custom Metrics via prometheus-adapter

prometheus-adapter bridges Prometheus and the Kubernetes custom metrics API (custom.metrics.k8s.io). HPA queries this API for Pods or Object type metrics.

Install prometheus-adapter

bash

helm install prometheus-adapter prometheus-community/prometheus-adapter \
  --namespace monitoring \
  --values adapter-values.yaml

yaml

1# adapter-values.yaml
2prometheus:
3  url: http://kube-prometheus-stack-prometheus.monitoring.svc
4  port: 9090
5
6rules:
7  default: false   # Don't auto-generate rules; define explicitly
8
9  custom:
10    # requests_per_second: per-pod HTTP request rate from Prometheus
11    - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
12      resources:
13        overrides:
14          namespace:
15            resource: namespace
16          pod:
17            resource: pod
18      name:
19        matches: "^(.*)_total$"
20        as: "${1}_per_second"    # Expose as http_requests_per_second
21      metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
22      # <<.Series>>, <<.LabelMatchers>>, <<.GroupBy>> are template variables
23
24    # active_connections: per-pod gauge metric
25    - seriesQuery: 'nginx_active_connections{namespace!="",pod!=""}'
26      resources:
27        overrides:
28          namespace:
29            resource: namespace
30          pod:
31            resource: pod
32      name:
33        as: "nginx_active_connections"
34      metricsQuery: 'avg(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)'

Verify the custom metric is available:

bash

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta2/namespaces/production/pods/*/http_requests_per_second"
# {"kind":"MetricValueList","apiVersion":"custom.metrics.k8s.io/v1beta2","metadata":{},"items":[{"describedObject":{"kind":"Pod","namespace":"production","name":"payments-api-xxxxx","apiVersion":"/v1"},"metricName":"http_requests_per_second","timestamp":"...","value":"250m","selector":null}]}

HPA Using Custom Metric

yaml

1apiVersion: autoscaling/v2
2kind: HorizontalPodAutoscaler
3metadata:
4  name: payments-api
5  namespace: production
6spec:
7  scaleTargetRef:
8    apiVersion: apps/v1
9    kind: Deployment
10    name: payments-api
11  minReplicas: 3
12  maxReplicas: 30
13  metrics:
14    - type: Pods
15      pods:
16        metric:
17          name: http_requests_per_second    # Must match the adapter's metricsQuery output
18        target:
19          type: AverageValue
20          averageValue: 100               # Scale when avg > 100 RPS per pod
21  behavior:
22    scaleDown:
23      stabilizationWindowSeconds: 300

External Metrics (SQS Queue Depth)

External metrics come from outside the cluster. The prometheus-adapter can expose these if Prometheus scrapes them, but for native AWS metrics, use KEDA's SQS trigger instead. For Prometheus-scraped external metrics:

yaml

# In prometheus-adapter config:
external:
  - seriesQuery: 'aws_sqs_approximate_number_of_messages_visible{queue_name="payments-orders"}'
    name:
      as: "sqs_queue_depth_payments"
    metricsQuery: 'max(<<.Series>>)'

yaml

1# HPA using external metric
2spec:
3  metrics:
4    - type: External
5      external:
6        metric:
7          name: sqs_queue_depth_payments
8        target:
9          type: AverageValue
10          averageValue: "5"    # 5 messages per pod (scales to 100 pods if queue has 500 msgs)

For queue-based scaling without custom metrics adapter configuration, KEDA is significantly simpler. See KEDA: Event-Driven Autoscaling for the KEDA approach.

ContainerResource Metrics

When a pod has multiple containers with different resource profiles (app container + sidecar), scaling on the total pod CPU average distorts the signal. ContainerResource scales on a specific container's resource usage:

yaml

1spec:
2  metrics:
3    - type: ContainerResource
4      containerResource:
5        name: cpu
6        container: app    # Scale based on the app container's CPU, not the sidecar's
7        target:
8          type: Utilization
9          averageUtilization: 70

This is particularly useful for Istio-sidecar-injected pods, where the Envoy proxy's CPU usage is unrelated to the application's load and shouldn't influence scaling decisions.

HPA vs KEDA

Aspect	HPA v2	KEDA
Built-in	Yes (in cluster since 1.6)	Requires separate install
Scale to zero	No (minReplicas ≥ 1)	Yes
Metric sources	Metrics APIs (custom/external via adapter)	60+ native scalers (SQS, Kafka, Redis, Cron, etc.)
Multi-metric scaling	Yes	Yes (multiple triggers)
Configuration complexity	Medium (prometheus-adapter required for custom)	Low for common sources
Compatibility	Works with any HPA controller	Installs alongside HPA, uses same API

KEDA and HPA can coexist in the same cluster — KEDA creates HPA objects under the hood using autoscaling/v2 with external metric triggers. For scale-to-zero use cases (batch jobs, development environments) or native integrations with AWS/Azure/GCP managed services, KEDA is the better choice. For straightforward CPU/memory scaling or simple Prometheus metric scaling, native HPA v2 with prometheus-adapter avoids an additional component.

Frequently Asked Questions

My HPA shows `unknown` for metrics — what's wrong?

bash

1kubectl describe hpa payments-api -n production
2# Metrics: (unknown/70%)
3
4# Check if metrics-server is running
5kubectl top pods -n production
6
7# Check if custom metrics API is available
8kubectl get --raw /apis/custom.metrics.k8s.io/v1beta2
9# Should return an API discovery response, not an error

For unknown on CPU: metrics-server is not running or pod doesn't have CPU requests. For unknown on custom metrics: prometheus-adapter is not running, or the metric series query doesn't match any Prometheus data (check with kubectl get --raw ... as shown above).

Can I use HPA with Argo Rollouts?

Yes. Argo Rollouts exposes its own HPA-compatible target (argo-rollouts.argoproj.io/rollout). Set scaleTargetRef.apiVersion: argoproj.io/v1alpha1 and scaleTargetRef.kind: Rollout — HPA scales the rollout's desired replica count, and Argo Rollouts distributes those replicas across canary and stable ReplicaSets according to the rollout weight.

What's the interaction between HPA and VPA?

HPA and VPA can coexist on the same workload if they manage different metrics:

HPA on CPU request utilisation + VPA on memory: conflict (both try to decide on CPU resource requests/replicas)
HPA on custom metrics (request rate) + VPA in Auto mode: can work, but VPA pod restarts during scaling events can interfere with HPA's metric accuracy

The safest combination: HPA on custom application metrics (not CPU/memory) + VPA in Off mode (recommendations only). For scale-out decisions, use HPA. For right-sizing requests, use VPA recommendations applied during maintenance windows.

For event-driven scaling with scale-to-zero and native queue/event integrations, see KEDA: Event-Driven Autoscaling for Kubernetes. For VPA right-sizing that complements HPA scaling, see Kubernetes Capacity Planning. For HPA signal selection, KEDA comparison, and scaling on queue depth and latency, see Kubernetes HPA Beyond CPU: Scaling on Custom and External Metrics. For correct resource requests and limits that HPA depends on to compute utilization accurately, see Kubernetes Resource Management: Requests, Limits, QoS, and LimitRanges. For node-level scaling that responds to HPA-driven pod demand, see Kubernetes Node Autoscaling: Cluster Autoscaler vs Karpenter.

Tuning autoscaling for a latency-sensitive production workload? Talk to us at Coding Protocols — we help platform teams configure HPA behavior and custom metrics that match their workload's actual scaling signal.

Kubernetes HPA v2: Behavior Tuning and ContainerResource Metrics

HPA v2 Metric Types

Basic CPU/Memory HPA

Scaling Behavior Configuration

Custom Metrics via prometheus-adapter

Install prometheus-adapter

HPA Using Custom Metric

External Metrics (SQS Queue Depth)

ContainerResource Metrics

HPA vs KEDA

Frequently Asked Questions

My HPA shows `unknown` for metrics — what's wrong?

Can I use HPA with Argo Rollouts?

What's the interaction between HPA and VPA?

Related Topics

Read Next

Kubernetes HPA Beyond CPU: Scaling on Custom and External Metrics

Why Your HPA Isn't Scaling — Fixing It with Custom Metrics (KEDA + Prometheus)

Karpenter v1: Node Provisioning, Consolidation, and Drift

HPA v2 Metric Types

Basic CPU/Memory HPA

Scaling Behavior Configuration

Custom Metrics via prometheus-adapter

Install prometheus-adapter

HPA Using Custom Metric

External Metrics (SQS Queue Depth)

ContainerResource Metrics

HPA vs KEDA

Frequently Asked Questions

My HPA shows unknown for metrics — what's wrong?

Can I use HPA with Argo Rollouts?

What's the interaction between HPA and VPA?

Related Topics

Read Next

Kubernetes HPA Beyond CPU: Scaling on Custom and External Metrics

Why Your HPA Isn't Scaling — Fixing It with Custom Metrics (KEDA + Prometheus)

Karpenter v1: Node Provisioning, Consolidation, and Drift

My HPA shows `unknown` for metrics — what's wrong?