Kubernetes

Setting Up Horizontal Pod Autoscaler with Custom Metrics

Intermediate30 min to complete11 min read

Scale your Kubernetes workloads on business metrics — RPS, queue depth, or latency — instead of just CPU. This tutorial wires up KEDA and Prometheus Adapter to drive HPA from real application signals.

Before you begin

  • kubectl configured against a running cluster
  • Helm 3 installed
  • Basic understanding of Kubernetes Deployments and Services
Kubernetes
HPA
Autoscaling
KEDA
Prometheus
DevOps

CPU-based autoscaling is a blunt instrument. Your application might be saturating a database connection pool at 20% CPU, or sitting idle at 80% while a downstream queue builds up. Custom metrics let HPA respond to signals that actually matter for your workload.

This tutorial uses KEDA — the Kubernetes Event-Driven Autoscaler — which is the cleanest way to get custom metrics into HPA without wrestling with the Prometheus Adapter's configuration format.

What You'll Build

An HPA that scales a worker deployment based on the number of pending jobs in a Redis list. When the queue is empty, the deployment scales to zero. When jobs arrive, it scales up proportionally.

The same pattern applies to: HTTP request rate, Kafka consumer lag, RabbitMQ queue depth, or any Prometheus metric.

Step 1: Install KEDA

bash
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda \
  --namespace keda \
  --create-namespace \
  --version 2.13.0

Verify the KEDA operator is running:

bash
kubectl get pods -n keda
# NAME                                      READY   STATUS    RESTARTS
# keda-operator-xxxx                        1/1     Running   0
# keda-operator-metrics-apiserver-xxxx      1/1     Running   0

KEDA installs two components: the operator (watches ScaledObjects) and the metrics API server (exposes custom metrics to the HPA controller).

Step 2: Deploy the Sample Worker

bash
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: job-worker
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: job-worker
  template:
    metadata:
      labels:
        app: job-worker
    spec:
      containers:
        - name: worker
          image: busybox
          command: ["sh", "-c", "while true; do sleep 5; done"]
          resources:
            requests:
              cpu: "100m"
              memory: "64Mi"
            limits:
              cpu: "200m"
              memory: "128Mi"
EOF

Step 3: Deploy Redis

bash
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install redis bitnami/redis \
  --namespace default \
  --set auth.enabled=false \
  --set replica.replicaCount=0

Get the Redis connection string:

bash
kubectl get svc redis-master -n default
# redis-master.default.svc.cluster.local:6379

Step 4: Create the ScaledObject

A ScaledObject tells KEDA what metric to watch and how to translate it into a replica count:

bash
kubectl apply -f - <<EOF
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: job-worker-scaler
  namespace: default
spec:
  scaleTargetRef:
    name: job-worker
  minReplicaCount: 0        # Scale to zero when queue is empty
  maxReplicaCount: 20
  pollingInterval: 15       # Check every 15 seconds
  cooldownPeriod: 60        # Wait 60s before scaling down
  triggers:
    - type: redis
      metadata:
        address: redis-master.default.svc.cluster.local:6379
        listName: job-queue
        listLength: "5"     # Target: 5 jobs per replica
EOF

listLength: "5" means KEDA targets 5 pending jobs per replica. With 50 jobs in the queue, it scales to 10 replicas. With 0 jobs, it scales to 0.

Step 5: Verify the ScaledObject

bash
kubectl get scaledobject job-worker-scaler
# NAME                 SCALETARGETKIND   SCALETARGETNAME   MIN   MAX   READY
# job-worker-scaler    Deployment        job-worker        0     20    True

kubectl get hpa
# NAME                          REFERENCE          TARGETS             MINPODS   MAXPODS   REPLICAS
# keda-hpa-job-worker-scaler    Deployment/job-worker   0/5 (avg)     1         20        1

KEDA creates and manages the HPA object automatically. You never touch the HPA directly.

Step 6: Test the Autoscaling

Push jobs into the Redis queue:

bash
# Exec into a temp pod with redis-cli
kubectl run redis-cli --rm -it --image=redis:7 -- redis-cli \
  -h redis-master.default.svc.cluster.local

# Inside the pod:
RPUSH job-queue job1 job2 job3 job4 job5 job6 job7 job8 job9 job10
LLEN job-queue
# (integer) 10

Watch the deployment scale up:

bash
kubectl get hpa -w
# TARGETS      REPLICAS
# 0/5 (avg)    0
# 10/5 (avg)   2     ← scaling up
# 10/5 (avg)   2

Clear the queue and watch it scale back to zero (after the cooldown period):

bash
kubectl run redis-cli --rm -it --image=redis:7 -- redis-cli \
  -h redis-master.default.svc.cluster.local DEL job-queue

Step 7: Scale on a Prometheus Metric Instead

For HTTP request rate or any Prometheus metric, swap the trigger:

yaml
triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
      metricName: http_requests_total
      query: |
        sum(rate(http_requests_total{job="api-server"}[2m]))
      threshold: "100"    # Scale up when RPS > 100 per replica

This scales based on the Prometheus query result. At 500 RPS with threshold 100, KEDA targets 5 replicas.

Common Issues

ScaledObject stuck in READY: False — check KEDA operator logs:

bash
kubectl logs -n keda -l app=keda-operator --tail=50

HPA not updating replicas — verify the metrics API is registered:

bash
kubectl get apiservice v1beta1.external.metrics.k8s.io
# Should show AVAILABLE: True

Scale-to-zero not workingminReplicaCount: 0 requires the workload to tolerate cold starts. If your app takes 30+ seconds to boot, increase cooldownPeriod and consider keeping minReplicaCount: 1 for latency-sensitive paths.

Cleanup

bash
kubectl delete scaledobject job-worker-scaler
kubectl delete deployment job-worker
helm uninstall redis
helm uninstall keda -n keda

We built Podscape to simplify Kubernetes workflows like this — logs, events, and cluster state in one interface, without switching tools.

Struggling with this in production?

We help teams fix these exact issues. Our engineers have deployed these patterns across production environments at scale.