CPU-based autoscaling is a blunt instrument. Your application might be saturating a database connection pool at 20% CPU, or sitting idle at 80% while a downstream queue builds up. Custom metrics let HPA respond to signals that actually matter for your workload.

This tutorial uses KEDA — the Kubernetes Event-Driven Autoscaler — which is the cleanest way to get custom metrics into HPA without wrestling with the Prometheus Adapter's configuration format.

What You'll Build

An HPA that scales a worker deployment based on the number of pending jobs in a Redis list. When the queue is empty, the deployment scales to zero. When jobs arrive, it scales up proportionally.

The same pattern applies to: HTTP request rate, Kafka consumer lag, RabbitMQ queue depth, or any Prometheus metric.

Step 1: Install KEDA

bash

helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda \
  --namespace keda \
  --create-namespace \
  --version 2.13.0

Verify the KEDA operator is running:

bash

kubectl get pods -n keda
# NAME                                      READY   STATUS    RESTARTS
# keda-operator-xxxx                        1/1     Running   0
# keda-operator-metrics-apiserver-xxxx      1/1     Running   0

KEDA installs two components: the operator (watches ScaledObjects) and the metrics API server (exposes custom metrics to the HPA controller).

Step 2: Deploy the Sample Worker

bash

kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: job-worker
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: job-worker
  template:
    metadata:
      labels:
        app: job-worker
    spec:
      containers:
        - name: worker
          image: busybox
          command: ["sh", "-c", "while true; do sleep 5; done"]
          resources:
            requests:
              cpu: "100m"
              memory: "64Mi"
            limits:
              cpu: "200m"
              memory: "128Mi"
EOF

Step 3: Deploy Redis

bash

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install redis bitnami/redis \
  --namespace default \
  --set auth.enabled=false \
  --set replica.replicaCount=0

Get the Redis connection string:

bash

kubectl get svc redis-master -n default
# redis-master.default.svc.cluster.local:6379

Step 4: Create the ScaledObject

A ScaledObject tells KEDA what metric to watch and how to translate it into a replica count:

bash

kubectl apply -f - <<EOF
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: job-worker-scaler
  namespace: default
spec:
  scaleTargetRef:
    name: job-worker
  minReplicaCount: 0        # Scale to zero when queue is empty
  maxReplicaCount: 20
  pollingInterval: 15       # Check every 15 seconds
  cooldownPeriod: 60        # Wait 60s before scaling down
  triggers:
    - type: redis
      metadata:
        address: redis-master.default.svc.cluster.local:6379
        listName: job-queue
        listLength: "5"     # Target: 5 jobs per replica
EOF

listLength: "5" means KEDA targets 5 pending jobs per replica. With 50 jobs in the queue, it scales to 10 replicas. With 0 jobs, it scales to 0.

Step 5: Verify the ScaledObject

bash

kubectl get scaledobject job-worker-scaler
# NAME                 SCALETARGETKIND   SCALETARGETNAME   MIN   MAX   READY
# job-worker-scaler    Deployment        job-worker        0     20    True

kubectl get hpa
# NAME                          REFERENCE          TARGETS             MINPODS   MAXPODS   REPLICAS
# keda-hpa-job-worker-scaler    Deployment/job-worker   0/5 (avg)     1         20        1

KEDA creates and manages the HPA object automatically. You never touch the HPA directly.

Step 6: Test the Autoscaling

Push jobs into the Redis queue:

bash

# Exec into a temp pod with redis-cli
kubectl run redis-cli --rm -it --image=redis:7 -- redis-cli \
  -h redis-master.default.svc.cluster.local

# Inside the pod:
RPUSH job-queue job1 job2 job3 job4 job5 job6 job7 job8 job9 job10
LLEN job-queue
# (integer) 10

Watch the deployment scale up:

bash

kubectl get hpa -w
# TARGETS      REPLICAS
# 0/5 (avg)    0
# 10/5 (avg)   2     ← scaling up
# 10/5 (avg)   2

Clear the queue and watch it scale back to zero (after the cooldown period):

bash

kubectl run redis-cli --rm -it --image=redis:7 -- redis-cli \
  -h redis-master.default.svc.cluster.local DEL job-queue

Step 7: Scale on a Prometheus Metric Instead

For HTTP request rate or any Prometheus metric, swap the trigger:

yaml

triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
      metricName: http_requests_total
      query: |
        sum(rate(http_requests_total{job="api-server"}[2m]))
      threshold: "100"    # Scale up when RPS > 100 per replica

This scales based on the Prometheus query result. At 500 RPS with threshold 100, KEDA targets 5 replicas.

Common Issues

ScaledObject stuck in READY: False — check KEDA operator logs:

bash

kubectl logs -n keda -l app=keda-operator --tail=50

HPA not updating replicas — verify the metrics API is registered:

bash

kubectl get apiservice v1beta1.external.metrics.k8s.io
# Should show AVAILABLE: True

Scale-to-zero not working — minReplicaCount: 0 requires the workload to tolerate cold starts. If your app takes 30+ seconds to boot, increase cooldownPeriod and consider keeping minReplicaCount: 1 for latency-sensitive paths.

Cleanup

bash

kubectl delete scaledobject job-worker-scaler
kubectl delete deployment job-worker
helm uninstall redis
helm uninstall keda -n keda

Setting Up Horizontal Pod Autoscaler with Custom Metrics

Before you begin

What You'll Build

Step 1: Install KEDA

Step 2: Deploy the Sample Worker

Step 3: Deploy Redis

Step 4: Create the ScaledObject

Step 5: Verify the ScaledObject

Step 6: Test the Autoscaling

Step 7: Scale on a Prometheus Metric Instead

Common Issues

Cleanup

Struggling with this in production?