Kubernetes

Running StatefulSets: Deploy a PostgreSQL Cluster in Kubernetes

Intermediate50 min to complete15 min read

Deploy a 3-replica PostgreSQL cluster on Kubernetes using StatefulSets, headless services, and VolumeClaimTemplates. Covers ordered startup, stable network identities, scaling, and the difference from Deployments.

Before you begin

  • kubectl installed and configured
  • Access to a Kubernetes cluster with dynamic storage provisioning
  • Basic familiarity with Kubernetes storage (PVCs and StorageClasses)
Kubernetes
StatefulSets
PostgreSQL
Storage
Databases
DevOps

The first rule of running databases in Kubernetes: don't use a Deployment. Deployments are designed for stateless workloads — pods are interchangeable, can be replaced in any order, get random names, and share no storage identity. Databases need the opposite: a stable hostname for each replica, ordered startup so the primary is available before replicas try to connect, and dedicated persistent storage that follows the pod if it's rescheduled.

StatefulSets provide all of this. This tutorial deploys a 3-replica PostgreSQL cluster to demonstrate each guarantee in practice.

What You'll Build

A 3-replica PostgreSQL StatefulSet named postgres with pods postgres-0, postgres-1, and postgres-2. Each pod gets its own PersistentVolumeClaim that persists independently. A headless Service enables stable DNS (postgres-0.postgres.default.svc.cluster.local) so you can always reach a specific replica.

Note: this tutorial focuses on StatefulSet mechanics. For production PostgreSQL with automatic failover, use CloudNativePG or the Zalando Postgres Operator instead of a raw StatefulSet.

Step 1: What Makes StatefulSets Different

FeatureDeploymentStatefulSet
Pod namesRandom (pod-abc123)Ordered (pod-0, pod-1, pod-2)
Startup orderAll pods start in parallelSequential: pod-0 must be Running and Ready before pod-1 starts
Shutdown orderAny orderReverse sequential: pod-2 before pod-1 before pod-0
Storage per podShared PVC (or none)Dedicated PVC per pod via VolumeClaimTemplates
DNSSingle ClusterIPStable: pod-0.<service>.namespace.svc.cluster.local
Pod identityEphemeral — any pod can replace any otherSticky — pod-0 is always pod-0

The sticky identity is what makes databases work. postgres-0 is always the primary because your application is hardcoded to connect to it. If the pod is rescheduled to a different node, it still comes up as postgres-0 with the same PVC and the same DNS name.

Step 2: Create the Headless Service

A headless Service (one with clusterIP: None) enables per-pod DNS without load balancing. Instead of routing to a single VIP, DNS resolves to individual pod IPs.

bash
1kubectl apply -f - <<EOF
2apiVersion: v1
3kind: Service
4metadata:
5  name: postgres
6  labels:
7    app: postgres
8spec:
9  clusterIP: None
10  selector:
11    app: postgres
12  ports:
13    - name: postgres
14      port: 5432
15      targetPort: 5432
16EOF

With this Service, each pod gets a stable DNS entry:

  • postgres-0.postgres.default.svc.cluster.local
  • postgres-1.postgres.default.svc.cluster.local
  • postgres-2.postgres.default.svc.cluster.local

If you also want a load-balanced endpoint (for read connections), create a second, non-headless Service alongside this one.

Step 3: Create the Secret

Never put database passwords in plain YAML. Create a Secret first:

bash
kubectl create secret generic postgres-secret \
  --from-literal=password='StrongPassword123!'
bash
kubectl get secret postgres-secret -o jsonpath='{.data.password}' | base64 -d; echo
# StrongPassword123!

Step 4: Deploy the StatefulSet

bash
1kubectl apply -f - <<EOF
2apiVersion: apps/v1
3kind: StatefulSet
4metadata:
5  name: postgres
6spec:
7  serviceName: postgres
8  replicas: 3
9  podManagementPolicy: OrderedReady
10  selector:
11    matchLabels:
12      app: postgres
13  template:
14    metadata:
15      labels:
16        app: postgres
17    spec:
18      terminationGracePeriodSeconds: 60
19      containers:
20        - name: postgres
21          image: postgres:16
22          ports:
23            - containerPort: 5432
24          env:
25            - name: POSTGRES_PASSWORD
26              valueFrom:
27                secretKeyRef:
28                  name: postgres-secret
29                  key: password
30            - name: PGDATA
31              value: /var/lib/postgresql/data/pgdata
32          volumeMounts:
33            - name: data
34              mountPath: /var/lib/postgresql/data
35          readinessProbe:
36            exec:
37              command: ["pg_isready", "-U", "postgres"]
38            initialDelaySeconds: 10
39            periodSeconds: 5
40          livenessProbe:
41            exec:
42              command: ["pg_isready", "-U", "postgres"]
43            initialDelaySeconds: 30
44            periodSeconds: 10
45  volumeClaimTemplates:
46    - metadata:
47        name: data
48      spec:
49        accessModes: ["ReadWriteOnce"]
50        storageClassName: standard
51        resources:
52          requests:
53            storage: 10Gi
54EOF

Key fields to understand:

serviceName: postgres — must match the headless Service name. This is what enables the stable DNS per pod.

podManagementPolicy: OrderedReady — the default. Kubernetes waits for postgres-0 to pass its readiness probe before starting postgres-1. Change to Parallel only if your workload doesn't need ordered startup.

terminationGracePeriodSeconds: 60 — gives PostgreSQL time to finish checkpoints and close connections cleanly before SIGKILL. The default (30 seconds) is often too short for a loaded database.

PGDATA: /var/lib/postgresql/data/pgdata — PostgreSQL initializes its data directory on first start. The PostgreSQL Docker image requires PGDATA to be a subdirectory of the mounted volume, not the mount point root itself, because the data directory must be empty at init time. If you mount to /var/lib/postgresql/data and set PGDATA to the same path, init fails if there's a lost+found directory from the filesystem. The /pgdata subdirectory avoids this.

volumeClaimTemplates — this is the StatefulSet-specific feature. Kubernetes creates a PVC for each pod using this template:

  • data-postgres-0 (10Gi, bound to postgres-0)
  • data-postgres-1 (10Gi, bound to postgres-1)
  • data-postgres-2 (10Gi, bound to postgres-2)

These PVCs are not deleted when the StatefulSet is deleted. This is intentional data safety.

Step 5: Verify Ordered Startup

bash
1kubectl get pods -w
2# NAME         READY   STATUS              RESTARTS   AGE
3# postgres-0   0/1     ContainerCreating   0          3s
4# postgres-0   0/1     Running             0          8s
5# postgres-0   1/1     Running             0          18s   ← readiness probe passes
6# postgres-1   0/1     Pending             0          19s
7# postgres-1   0/1     ContainerCreating   0          21s
8# postgres-1   1/1     Running             0          35s   ← postgres-1 now ready
9# postgres-2   0/1     Pending             0          36s
10# postgres-2   1/1     Running             0          52s

postgres-1 doesn't even start until postgres-0 is fully ready. postgres-2 waits for postgres-1. This ordering guarantee is what lets you safely configure postgres-0 as the primary and have replicas join after the primary is up.

Step 6: Verify Stable DNS

Run a debug pod and test DNS resolution:

bash
1kubectl run -it --rm debug \
2  --image=postgres:16 \
3  --restart=Never \
4  -- psql -h postgres-0.postgres.default.svc.cluster.local -U postgres
5# Password for user postgres: StrongPassword123!
6# psql (16.x)
7# Type "help" for help.
8# postgres=#

The hostname postgres-0.postgres.default.svc.cluster.local always resolves to the pod named postgres-0, regardless of which node it's running on or its current IP.

Verify all three DNS entries resolve:

bash
1kubectl run -it --rm debug --image=busybox --restart=Never -- sh
2
3# Inside the debug pod:
4nslookup postgres-0.postgres.default.svc.cluster.local
5# Address: 10.0.2.15
6nslookup postgres-1.postgres.default.svc.cluster.local
7# Address: 10.0.1.22
8nslookup postgres-2.postgres.default.svc.cluster.local
9# Address: 10.0.3.8

Step 7: Scaling

Scale up to 5 replicas:

bash
kubectl scale statefulset postgres --replicas=5
# postgres-3 starts after postgres-2 is ready
# postgres-4 starts after postgres-3 is ready

Scale back down to 2:

bash
kubectl scale statefulset postgres --replicas=2
# postgres-4 is terminated first, then postgres-3, then postgres-2
# Reverse-ordered shutdown ensures replicas are removed before the primary

Scaling down does not delete PVCs. After scaling to 2, data-postgres-2, data-postgres-3, and data-postgres-4 still exist:

bash
1kubectl get pvc
2# NAME             STATUS   VOLUME       CAPACITY
3# data-postgres-0  Bound    pvc-abc...   10Gi
4# data-postgres-1  Bound    pvc-def...   10Gi
5# data-postgres-2  Bound    pvc-ghi...   10Gi   ← still exists (pod terminated, PVC retained by design)
6# data-postgres-3  Bound    pvc-jkl...   10Gi   ← still exists
7# data-postgres-4  Bound    pvc-mno...   10Gi   ← still exists

This is intentional. If you later scale back up to 3, postgres-2 will be reattached to data-postgres-2 with all its data intact. Delete the orphaned PVCs manually if you want to reclaim the storage.

Step 8: Inspect the Running Cluster

Connect to each pod directly:

bash
1# Connect to the primary
2kubectl exec -it postgres-0 -- psql -U postgres
3
4# List databases
5postgres=# \l
6
7# Check version
8postgres=# SELECT version();

Verify each pod's dedicated storage:

bash
kubectl describe pod postgres-0 | grep -A3 "Volumes:"
# Volumes:
#   data:
#     Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
#     ClaimName:  data-postgres-0

Each pod has its own claim. If postgres-0 is rescheduled to a different node, it mounts data-postgres-0 on the new node — the data follows the pod.

Common Mistakes to Avoid

Using a regular (ClusterIP) Service as serviceNameserviceName must reference a headless Service. If you point it at a regular Service, the stable per-pod DNS won't work. The pods will still run, but you lose the DNS guarantee.

PGDATA at the root of the mount — set PGDATA to a subdirectory like /var/lib/postgresql/data/pgdata. Some provisioners create a lost+found directory at the volume root. PostgreSQL refuses to initialize if the data directory is non-empty. The subdirectory avoids this collision entirely.

Deleting the StatefulSet and expecting PVCs to be cleaned upkubectl delete statefulset postgres does NOT delete the PVCs. This is the correct behavior (your data is preserved). But if you want to fully tear down, you must delete the PVCs separately.

kubectl delete statefulset does not guarantee ordered pod termination — the reverse-sequential shutdown order (pod-N first, pod-0 last) only applies during scale-down operations. When you delete the StatefulSet object directly, Kubernetes does not guarantee termination order. To get ordered shutdown before deletion, scale to 0 first: kubectl scale statefulset postgres --replicas=0, wait for all pods to terminate, then delete the StatefulSet.

podManagementPolicy: Parallel for databases — this policy starts all pods simultaneously, which is useful for stateless workloads that need fast scale-out. For PostgreSQL, the replica containers may try to connect to the primary before it's ready and crash. Use OrderedReady unless you've verified your initialization process can handle a parallel start.

No terminationGracePeriodSeconds — the default is 30 seconds. Under heavy load, PostgreSQL may need longer to flush WAL and complete in-flight transactions. A SIGKILL during a checkpoint can corrupt the data directory. Set it to 60-120 seconds for production.

Cleanup

bash
kubectl delete statefulset postgres
kubectl delete svc postgres
kubectl delete secret postgres-secret
# PVCs are NOT deleted by the above — delete by name:
kubectl delete pvc data-postgres-0 data-postgres-1 data-postgres-2

What's Next

Official References

We built Podscape to simplify Kubernetes workflows like this — logs, events, and cluster state in one interface, without switching tools.

Struggling with this in production?

We help teams fix these exact issues. Our engineers have deployed these patterns across production environments at scale.