Running StatefulSets: Deploy a PostgreSQL Cluster in Kubernetes
Deploy a 3-replica PostgreSQL cluster on Kubernetes using StatefulSets, headless services, and VolumeClaimTemplates. Covers ordered startup, stable network identities, scaling, and the difference from Deployments.
Before you begin
- kubectl installed and configured
- Access to a Kubernetes cluster with dynamic storage provisioning
- Basic familiarity with Kubernetes storage (PVCs and StorageClasses)
The first rule of running databases in Kubernetes: don't use a Deployment. Deployments are designed for stateless workloads — pods are interchangeable, can be replaced in any order, get random names, and share no storage identity. Databases need the opposite: a stable hostname for each replica, ordered startup so the primary is available before replicas try to connect, and dedicated persistent storage that follows the pod if it's rescheduled.
StatefulSets provide all of this. This tutorial deploys a 3-replica PostgreSQL cluster to demonstrate each guarantee in practice.
What You'll Build
A 3-replica PostgreSQL StatefulSet named postgres with pods postgres-0, postgres-1, and postgres-2. Each pod gets its own PersistentVolumeClaim that persists independently. A headless Service enables stable DNS (postgres-0.postgres.default.svc.cluster.local) so you can always reach a specific replica.
Note: this tutorial focuses on StatefulSet mechanics. For production PostgreSQL with automatic failover, use CloudNativePG or the Zalando Postgres Operator instead of a raw StatefulSet.
Step 1: What Makes StatefulSets Different
| Feature | Deployment | StatefulSet |
|---|---|---|
| Pod names | Random (pod-abc123) | Ordered (pod-0, pod-1, pod-2) |
| Startup order | All pods start in parallel | Sequential: pod-0 must be Running and Ready before pod-1 starts |
| Shutdown order | Any order | Reverse sequential: pod-2 before pod-1 before pod-0 |
| Storage per pod | Shared PVC (or none) | Dedicated PVC per pod via VolumeClaimTemplates |
| DNS | Single ClusterIP | Stable: pod-0.<service>.namespace.svc.cluster.local |
| Pod identity | Ephemeral — any pod can replace any other | Sticky — pod-0 is always pod-0 |
The sticky identity is what makes databases work. postgres-0 is always the primary because your application is hardcoded to connect to it. If the pod is rescheduled to a different node, it still comes up as postgres-0 with the same PVC and the same DNS name.
Step 2: Create the Headless Service
A headless Service (one with clusterIP: None) enables per-pod DNS without load balancing. Instead of routing to a single VIP, DNS resolves to individual pod IPs.
1kubectl apply -f - <<EOF
2apiVersion: v1
3kind: Service
4metadata:
5 name: postgres
6 labels:
7 app: postgres
8spec:
9 clusterIP: None
10 selector:
11 app: postgres
12 ports:
13 - name: postgres
14 port: 5432
15 targetPort: 5432
16EOFWith this Service, each pod gets a stable DNS entry:
postgres-0.postgres.default.svc.cluster.localpostgres-1.postgres.default.svc.cluster.localpostgres-2.postgres.default.svc.cluster.local
If you also want a load-balanced endpoint (for read connections), create a second, non-headless Service alongside this one.
Step 3: Create the Secret
Never put database passwords in plain YAML. Create a Secret first:
kubectl create secret generic postgres-secret \
--from-literal=password='StrongPassword123!'kubectl get secret postgres-secret -o jsonpath='{.data.password}' | base64 -d; echo
# StrongPassword123!Step 4: Deploy the StatefulSet
1kubectl apply -f - <<EOF
2apiVersion: apps/v1
3kind: StatefulSet
4metadata:
5 name: postgres
6spec:
7 serviceName: postgres
8 replicas: 3
9 podManagementPolicy: OrderedReady
10 selector:
11 matchLabels:
12 app: postgres
13 template:
14 metadata:
15 labels:
16 app: postgres
17 spec:
18 terminationGracePeriodSeconds: 60
19 containers:
20 - name: postgres
21 image: postgres:16
22 ports:
23 - containerPort: 5432
24 env:
25 - name: POSTGRES_PASSWORD
26 valueFrom:
27 secretKeyRef:
28 name: postgres-secret
29 key: password
30 - name: PGDATA
31 value: /var/lib/postgresql/data/pgdata
32 volumeMounts:
33 - name: data
34 mountPath: /var/lib/postgresql/data
35 readinessProbe:
36 exec:
37 command: ["pg_isready", "-U", "postgres"]
38 initialDelaySeconds: 10
39 periodSeconds: 5
40 livenessProbe:
41 exec:
42 command: ["pg_isready", "-U", "postgres"]
43 initialDelaySeconds: 30
44 periodSeconds: 10
45 volumeClaimTemplates:
46 - metadata:
47 name: data
48 spec:
49 accessModes: ["ReadWriteOnce"]
50 storageClassName: standard
51 resources:
52 requests:
53 storage: 10Gi
54EOFKey fields to understand:
serviceName: postgres — must match the headless Service name. This is what enables the stable DNS per pod.
podManagementPolicy: OrderedReady — the default. Kubernetes waits for postgres-0 to pass its readiness probe before starting postgres-1. Change to Parallel only if your workload doesn't need ordered startup.
terminationGracePeriodSeconds: 60 — gives PostgreSQL time to finish checkpoints and close connections cleanly before SIGKILL. The default (30 seconds) is often too short for a loaded database.
PGDATA: /var/lib/postgresql/data/pgdata — PostgreSQL initializes its data directory on first start. The PostgreSQL Docker image requires PGDATA to be a subdirectory of the mounted volume, not the mount point root itself, because the data directory must be empty at init time. If you mount to /var/lib/postgresql/data and set PGDATA to the same path, init fails if there's a lost+found directory from the filesystem. The /pgdata subdirectory avoids this.
volumeClaimTemplates — this is the StatefulSet-specific feature. Kubernetes creates a PVC for each pod using this template:
data-postgres-0(10Gi, bound to postgres-0)data-postgres-1(10Gi, bound to postgres-1)data-postgres-2(10Gi, bound to postgres-2)
These PVCs are not deleted when the StatefulSet is deleted. This is intentional data safety.
Step 5: Verify Ordered Startup
1kubectl get pods -w
2# NAME READY STATUS RESTARTS AGE
3# postgres-0 0/1 ContainerCreating 0 3s
4# postgres-0 0/1 Running 0 8s
5# postgres-0 1/1 Running 0 18s ← readiness probe passes
6# postgres-1 0/1 Pending 0 19s
7# postgres-1 0/1 ContainerCreating 0 21s
8# postgres-1 1/1 Running 0 35s ← postgres-1 now ready
9# postgres-2 0/1 Pending 0 36s
10# postgres-2 1/1 Running 0 52spostgres-1 doesn't even start until postgres-0 is fully ready. postgres-2 waits for postgres-1. This ordering guarantee is what lets you safely configure postgres-0 as the primary and have replicas join after the primary is up.
Step 6: Verify Stable DNS
Run a debug pod and test DNS resolution:
1kubectl run -it --rm debug \
2 --image=postgres:16 \
3 --restart=Never \
4 -- psql -h postgres-0.postgres.default.svc.cluster.local -U postgres
5# Password for user postgres: StrongPassword123!
6# psql (16.x)
7# Type "help" for help.
8# postgres=#The hostname postgres-0.postgres.default.svc.cluster.local always resolves to the pod named postgres-0, regardless of which node it's running on or its current IP.
Verify all three DNS entries resolve:
1kubectl run -it --rm debug --image=busybox --restart=Never -- sh
2
3# Inside the debug pod:
4nslookup postgres-0.postgres.default.svc.cluster.local
5# Address: 10.0.2.15
6nslookup postgres-1.postgres.default.svc.cluster.local
7# Address: 10.0.1.22
8nslookup postgres-2.postgres.default.svc.cluster.local
9# Address: 10.0.3.8Step 7: Scaling
Scale up to 5 replicas:
kubectl scale statefulset postgres --replicas=5
# postgres-3 starts after postgres-2 is ready
# postgres-4 starts after postgres-3 is readyScale back down to 2:
kubectl scale statefulset postgres --replicas=2
# postgres-4 is terminated first, then postgres-3, then postgres-2
# Reverse-ordered shutdown ensures replicas are removed before the primaryScaling down does not delete PVCs. After scaling to 2, data-postgres-2, data-postgres-3, and data-postgres-4 still exist:
1kubectl get pvc
2# NAME STATUS VOLUME CAPACITY
3# data-postgres-0 Bound pvc-abc... 10Gi
4# data-postgres-1 Bound pvc-def... 10Gi
5# data-postgres-2 Bound pvc-ghi... 10Gi ← still exists (pod terminated, PVC retained by design)
6# data-postgres-3 Bound pvc-jkl... 10Gi ← still exists
7# data-postgres-4 Bound pvc-mno... 10Gi ← still existsThis is intentional. If you later scale back up to 3, postgres-2 will be reattached to data-postgres-2 with all its data intact. Delete the orphaned PVCs manually if you want to reclaim the storage.
Step 8: Inspect the Running Cluster
Connect to each pod directly:
1# Connect to the primary
2kubectl exec -it postgres-0 -- psql -U postgres
3
4# List databases
5postgres=# \l
6
7# Check version
8postgres=# SELECT version();Verify each pod's dedicated storage:
kubectl describe pod postgres-0 | grep -A3 "Volumes:"
# Volumes:
# data:
# Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
# ClaimName: data-postgres-0Each pod has its own claim. If postgres-0 is rescheduled to a different node, it mounts data-postgres-0 on the new node — the data follows the pod.
Common Mistakes to Avoid
Using a regular (ClusterIP) Service as serviceName — serviceName must reference a headless Service. If you point it at a regular Service, the stable per-pod DNS won't work. The pods will still run, but you lose the DNS guarantee.
PGDATA at the root of the mount — set PGDATA to a subdirectory like /var/lib/postgresql/data/pgdata. Some provisioners create a lost+found directory at the volume root. PostgreSQL refuses to initialize if the data directory is non-empty. The subdirectory avoids this collision entirely.
Deleting the StatefulSet and expecting PVCs to be cleaned up — kubectl delete statefulset postgres does NOT delete the PVCs. This is the correct behavior (your data is preserved). But if you want to fully tear down, you must delete the PVCs separately.
kubectl delete statefulset does not guarantee ordered pod termination — the reverse-sequential shutdown order (pod-N first, pod-0 last) only applies during scale-down operations. When you delete the StatefulSet object directly, Kubernetes does not guarantee termination order. To get ordered shutdown before deletion, scale to 0 first: kubectl scale statefulset postgres --replicas=0, wait for all pods to terminate, then delete the StatefulSet.
podManagementPolicy: Parallel for databases — this policy starts all pods simultaneously, which is useful for stateless workloads that need fast scale-out. For PostgreSQL, the replica containers may try to connect to the primary before it's ready and crash. Use OrderedReady unless you've verified your initialization process can handle a parallel start.
No terminationGracePeriodSeconds — the default is 30 seconds. Under heavy load, PostgreSQL may need longer to flush WAL and complete in-flight transactions. A SIGKILL during a checkpoint can corrupt the data directory. Set it to 60-120 seconds for production.
Cleanup
kubectl delete statefulset postgres
kubectl delete svc postgres
kubectl delete secret postgres-secret
# PVCs are NOT deleted by the above — delete by name:
kubectl delete pvc data-postgres-0 data-postgres-1 data-postgres-2What's Next
- CloudNativePG — Kubernetes-native PostgreSQL operator with automatic failover, backup, and PITR
- Kubernetes Storage: PVCs and StorageClasses — if you need a primer on PVCs before this tutorial
Official References
- StatefulSets — complete reference for StatefulSet guarantees, pod identity, and deployment/scaling semantics
- Headless Services — how headless Services enable per-pod DNS
- VolumeClaimTemplates — how Kubernetes creates and manages per-pod PVCs
- Running Databases in Kubernetes — when it makes sense to run databases in Kubernetes vs. using managed services
We built Podscape to simplify Kubernetes workflows like this — logs, events, and cluster state in one interface, without switching tools.
Struggling with this in production?
We help teams fix these exact issues. Our engineers have deployed these patterns across production environments at scale.