Setting Up Horizontal Pod Autoscaler with Custom Metrics
Scale your Kubernetes workloads on business metrics — RPS, queue depth, or latency — instead of just CPU. This tutorial wires up KEDA and Prometheus Adapter to drive HPA from real application signals.
Before you begin
- kubectl configured against a running cluster
- Helm 3 installed
- Basic understanding of Kubernetes Deployments and Services
CPU-based autoscaling is a blunt instrument. Your application might be saturating a database connection pool at 20% CPU, or sitting idle at 80% while a downstream queue builds up. Custom metrics let HPA respond to signals that actually matter for your workload.
This tutorial uses KEDA — the Kubernetes Event-Driven Autoscaler — which is the cleanest way to get custom metrics into HPA without wrestling with the Prometheus Adapter's configuration format.
What You'll Build
An HPA that scales a worker deployment based on the number of pending jobs in a Redis list. When the queue is empty, the deployment scales to zero. When jobs arrive, it scales up proportionally.
The same pattern applies to: HTTP request rate, Kafka consumer lag, RabbitMQ queue depth, or any Prometheus metric.
Step 1: Install KEDA
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda \
--namespace keda \
--create-namespace \
--version 2.13.0
Verify the KEDA operator is running:
kubectl get pods -n keda
# NAME READY STATUS RESTARTS
# keda-operator-xxxx 1/1 Running 0
# keda-operator-metrics-apiserver-xxxx 1/1 Running 0
KEDA installs two components: the operator (watches ScaledObjects) and the metrics API server (exposes custom metrics to the HPA controller).
Step 2: Deploy the Sample Worker
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: job-worker
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: job-worker
template:
metadata:
labels:
app: job-worker
spec:
containers:
- name: worker
image: busybox
command: ["sh", "-c", "while true; do sleep 5; done"]
resources:
requests:
cpu: "100m"
memory: "64Mi"
limits:
cpu: "200m"
memory: "128Mi"
EOF
Step 3: Deploy Redis
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install redis bitnami/redis \
--namespace default \
--set auth.enabled=false \
--set replica.replicaCount=0
Get the Redis connection string:
kubectl get svc redis-master -n default
# redis-master.default.svc.cluster.local:6379
Step 4: Create the ScaledObject
A ScaledObject tells KEDA what metric to watch and how to translate it into a replica count:
kubectl apply -f - <<EOF
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: job-worker-scaler
namespace: default
spec:
scaleTargetRef:
name: job-worker
minReplicaCount: 0 # Scale to zero when queue is empty
maxReplicaCount: 20
pollingInterval: 15 # Check every 15 seconds
cooldownPeriod: 60 # Wait 60s before scaling down
triggers:
- type: redis
metadata:
address: redis-master.default.svc.cluster.local:6379
listName: job-queue
listLength: "5" # Target: 5 jobs per replica
EOF
listLength: "5" means KEDA targets 5 pending jobs per replica. With 50 jobs in the queue, it scales to 10 replicas. With 0 jobs, it scales to 0.
Step 5: Verify the ScaledObject
kubectl get scaledobject job-worker-scaler
# NAME SCALETARGETKIND SCALETARGETNAME MIN MAX READY
# job-worker-scaler Deployment job-worker 0 20 True
kubectl get hpa
# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
# keda-hpa-job-worker-scaler Deployment/job-worker 0/5 (avg) 1 20 1
KEDA creates and manages the HPA object automatically. You never touch the HPA directly.
Step 6: Test the Autoscaling
Push jobs into the Redis queue:
# Exec into a temp pod with redis-cli
kubectl run redis-cli --rm -it --image=redis:7 -- redis-cli \
-h redis-master.default.svc.cluster.local
# Inside the pod:
RPUSH job-queue job1 job2 job3 job4 job5 job6 job7 job8 job9 job10
LLEN job-queue
# (integer) 10
Watch the deployment scale up:
kubectl get hpa -w
# TARGETS REPLICAS
# 0/5 (avg) 0
# 10/5 (avg) 2 ← scaling up
# 10/5 (avg) 2
Clear the queue and watch it scale back to zero (after the cooldown period):
kubectl run redis-cli --rm -it --image=redis:7 -- redis-cli \
-h redis-master.default.svc.cluster.local DEL job-queue
Step 7: Scale on a Prometheus Metric Instead
For HTTP request rate or any Prometheus metric, swap the trigger:
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
metricName: http_requests_total
query: |
sum(rate(http_requests_total{job="api-server"}[2m]))
threshold: "100" # Scale up when RPS > 100 per replica
This scales based on the Prometheus query result. At 500 RPS with threshold 100, KEDA targets 5 replicas.
Common Issues
ScaledObject stuck in READY: False — check KEDA operator logs:
kubectl logs -n keda -l app=keda-operator --tail=50
HPA not updating replicas — verify the metrics API is registered:
kubectl get apiservice v1beta1.external.metrics.k8s.io
# Should show AVAILABLE: True
Scale-to-zero not working — minReplicaCount: 0 requires the workload to tolerate cold starts. If your app takes 30+ seconds to boot, increase cooldownPeriod and consider keeping minReplicaCount: 1 for latency-sensitive paths.
Cleanup
kubectl delete scaledobject job-worker-scaler
kubectl delete deployment job-worker
helm uninstall redis
helm uninstall keda -n keda
We built Podscape to simplify Kubernetes workflows like this — logs, events, and cluster state in one interface, without switching tools.
Struggling with this in production?
We help teams fix these exact issues. Our engineers have deployed these patterns across production environments at scale.