Kubernetes
13 min readMay 3, 2026

Kubernetes StatefulSets and Persistent Storage: Patterns for Stateful Workloads

Running stateful workloads in Kubernetes — databases, message queues, caches — requires stable network identity, ordered deployment, and persistent volumes that survive pod restarts. StatefulSets provide the first two; StorageClasses and PersistentVolumeClaims handle the third. Together they make Kubernetes a viable home for workloads that traditionally required VMs.

CO
Coding Protocols Team
Platform Engineering
Kubernetes StatefulSets and Persistent Storage: Patterns for Stateful Workloads

A Deployment's pods are interchangeable — they can be rescheduled to any node, given any IP, and restarted in any order. For stateless services, this is ideal. For a PostgreSQL primary, a Kafka broker, or a Redis cluster, it's a problem: each node has a specific role, stores its own data, and must be addressable by a stable hostname.

StatefulSet solves the identity problem. Persistent volumes solve the data problem. Understanding both — and how they interact with storage provisioners on EKS, GKE, or bare metal — is the foundation for running stateful workloads in Kubernetes.


StatefulSet vs Deployment

PropertyDeploymentStatefulSet
Pod namepod-<random>pod-0, pod-1, pod-2 (stable, ordinal)
DNS hostnameNo stable hostnamepod-0.service-name.namespace.svc.cluster.local
Scaling orderParallelSerial (0→1→2 on scale-up; 2→1→0 on scale-down)
Volume bindingShared PVC (unusual) or ephemeralSeparate PVC per pod (volumeClaimTemplate)
Rolling updateParallel with maxUnavailableSerial, from highest to lowest ordinal
Pod identityInterchangeableEach pod has unique, persistent identity

The stable hostname matters for clustering protocols (Raft, Paxos, Kafka broker IDs) — a pod that rejoins after restart must connect to the same peer set using the same identity.


StatefulSet Anatomy

yaml
1apiVersion: apps/v1
2kind: StatefulSet
3metadata:
4  name: postgres
5  namespace: production
6spec:
7  serviceName: postgres    # Must match a Headless Service (clusterIP: None)
8  replicas: 3
9  selector:
10    matchLabels:
11      app: postgres
12
13  updateStrategy:
14    type: RollingUpdate
15    rollingUpdate:
16      partition: 0    # Only update pods with ordinal >= partition (canary: set to N-1)
17
18  podManagementPolicy: OrderedReady    # Default: serial. Use Parallel for independent pods.
19
20  template:
21    metadata:
22      labels:
23        app: postgres
24    spec:
25      terminationGracePeriodSeconds: 60    # Postgres needs time to flush WAL
26
27      containers:
28        - name: postgres
29          image: postgres:16
30          env:
31            - name: PGDATA
32              value: /data/pgdata
33          ports:
34            - containerPort: 5432
35              name: postgres
36          volumeMounts:
37            - name: data
38              mountPath: /data    # Mounts the PVC created by volumeClaimTemplates
39
40          resources:
41            requests:
42              cpu: 500m
43              memory: 1Gi
44            limits:
45              cpu: "2"
46              memory: 4Gi
47
48          readinessProbe:
49            exec:
50              command: ["pg_isready", "-U", "postgres"]
51            initialDelaySeconds: 10
52            periodSeconds: 10
53
54  # Each pod gets its own PVC — named data-postgres-0, data-postgres-1, etc.
55  volumeClaimTemplates:
56    - metadata:
57        name: data
58      spec:
59        accessModes: [ReadWriteOncePod]
60        storageClassName: gp3
61        resources:
62          requests:
63            storage: 100Gi

Headless Service

The Headless Service (clusterIP: None) is what enables stable DNS names for each pod:

yaml
1apiVersion: v1
2kind: Service
3metadata:
4  name: postgres
5  namespace: production
6spec:
7  clusterIP: None    # Headless — returns all pod IPs from DNS query
8  selector:
9    app: postgres
10  ports:
11    - port: 5432
12      name: postgres

With this Service, each pod is reachable at:

  • postgres-0.postgres.production.svc.cluster.local
  • postgres-1.postgres.production.svc.cluster.local
  • postgres-2.postgres.production.svc.cluster.local

A separate ClusterIP Service pointing to the primary (if using primary-replica replication managed externally or via a sidecar like Patroni) is used for read-write connections.


StorageClass Configuration

AWS EBS (gp3)

yaml
1apiVersion: storage.k8s.io/v1
2kind: StorageClass
3metadata:
4  name: gp3
5  annotations:
6    storageclass.kubernetes.io/is-default-class: "true"
7provisioner: ebs.csi.aws.com    # AWS EBS CSI Driver
8volumeBindingMode: WaitForFirstConsumer    # Create volume in same AZ as pod
9reclaimPolicy: Retain    # Don't delete volume when PVC is deleted (recommended for databases)
10allowVolumeExpansion: true
11parameters:
12  type: gp3
13  throughput: "200"    # MB/s (gp3 baseline: 125, max: 1000)
14  iops: "4000"         # IOPS (gp3 baseline: 3000, max: 16000)
15  encrypted: "true"
16  kmsKeyId: "arn:aws:kms:us-east-1:123456789:key/xxxx"    # Customer-managed KMS key

WaitForFirstConsumer is critical for EBS — EBS volumes are AZ-specific, so binding must wait until the pod is scheduled to know which AZ to create the volume in. With Immediate binding, volumes can be created in the wrong AZ and the pod will fail to start.

AWS EFS (for ReadWriteMany)

EBS volumes are ReadWriteOnce — one pod at a time. For shared storage (multiple pods reading/writing the same volume), use EFS:

yaml
1apiVersion: storage.k8s.io/v1
2kind: StorageClass
3metadata:
4  name: efs
5provisioner: efs.csi.aws.com
6parameters:
7  provisioningMode: efs-ap    # Access point per PVC (isolation per workload)
8  fileSystemId: fs-xxxxxxxx   # EFS filesystem ID
9  directoryPerms: "700"
10  basePath: "/efs"
11reclaimPolicy: Retain
12volumeBindingMode: Immediate    # EFS is multi-AZ, no need to wait

Use EFS for: machine learning model storage, shared config files, content management systems. Don't use EFS for: databases (performance characteristics are wrong for random I/O), high-throughput workloads.


PersistentVolume Lifecycle

StorageClass (defines provisioner + parameters)
    ↓
PVC created (by volumeClaimTemplate or directly)
    ↓
CSI provisioner creates the underlying storage (EBS volume, NFS mount)
    ↓
PV created and bound to PVC
    ↓
Pod mounts the PVC
    ↓
Pod deleted
    ↓
PVC remains (volumeClaimTemplate PVCs are not deleted with the StatefulSet pod)
    ↓
StatefulSet scaled down (pod-2 deleted)
    PVC data-postgres-2 persists — re-used if pod-2 comes back

Retain vs Delete Reclaim Policy

reclaimPolicy: Delete deletes the EBS volume when the PVC is deleted — fine for ephemeral test data, dangerous for production databases. Always use Retain for stateful workloads and manage volume lifecycle manually.

bash
1# PVC for StatefulSet pods is not deleted automatically when the pod is deleted
2# You must delete PVCs manually:
3kubectl delete pvc data-postgres-2 -n production
4
5# With reclaimPolicy: Retain, the PV (and EBS volume) becomes "Released"
6# You must manually recycle or delete it:
7kubectl delete pv pvc-xxxx
8aws ec2 delete-volume --volume-id vol-xxxx

Expanding Volumes

With allowVolumeExpansion: true in the StorageClass:

bash
kubectl patch pvc data-postgres-0 -n production \
  --type merge \
  --patch '{"spec": {"resources": {"requests": {"storage": "200Gi"}}}}'

# The CSI driver resizes the EBS volume
# The pod must be restarted to see the new size (filesystem resize happens on pod restart)

StatefulSet Patterns

Ordered Scaling with partition

partition in rollingUpdate enables canary updates for StatefulSets — update only pods with ordinal >= partition:

yaml
updateStrategy:
  type: RollingUpdate
  rollingUpdate:
    partition: 2    # Only update pod-2 (pod-0 and pod-1 stay on old version)

Validate pod-2 before setting partition: 0 to update all pods. This is the StatefulSet equivalent of Argo Rollouts canary for workloads that can't use it directly.

Anti-Affinity for Spread

Database replicas should not co-locate on the same node:

yaml
1affinity:
2  podAntiAffinity:
3    requiredDuringSchedulingIgnoredDuringExecution:
4      - labelSelector:
5          matchLabels:
6            app: postgres
7        topologyKey: kubernetes.io/hostname    # No two postgres pods on the same node

For AZ spread (required for true HA):

yaml
1affinity:
2  podAntiAffinity:
3    preferredDuringSchedulingIgnoredDuringExecution:
4      - weight: 100
5        podAffinityTerm:
6          labelSelector:
7            matchLabels:
8              app: postgres
9          topologyKey: topology.kubernetes.io/zone

Backup Patterns

Volume snapshots (CSI-native):

yaml
1apiVersion: snapshot.storage.k8s.io/v1
2kind: VolumeSnapshot
3metadata:
4  name: postgres-0-snapshot-2026-05-09
5  namespace: production
6spec:
7  volumeSnapshotClassName: csi-aws-vsc   # Requires EBS CSI driver installed with volumeSnapshotClass.create=true
8  source:
9    persistentVolumeClaimName: data-postgres-0

This creates an EBS snapshot. For consistent database backups, the application must be quiesced (or use PostgreSQL's pg_backup_start() + pg_backup_stop()) before snapshotting. Velero automates backup orchestration including application hooks.


Frequently Asked Questions

Should I run databases in Kubernetes?

For teams with Kubernetes expertise, running PostgreSQL, Redis, or Kafka in Kubernetes with proper StatefulSets, backup automation, and monitoring is operationally viable. For teams that prefer managed services, AWS RDS/ElastiCache/MSK eliminate the operational burden at the cost of tighter AWS coupling. The tradeoff is clear: managed services trade control for convenience. See Databases in Kubernetes: Smart Move or Unnecessary Risk? for the full analysis.

My StatefulSet pod is stuck in Pending — what's wrong?

bash
1# Check if PVC is bound
2kubectl get pvc -n production
3
4# If PVC is Pending, check the events
5kubectl describe pvc data-postgres-0 -n production
6
7# Common causes:
8# - StorageClass not found or provisioner not running (check EBS CSI driver)
9# - Volume binding mode WaitForFirstConsumer: pod isn't scheduled yet
10# - Insufficient EBS quota in the AZ
11# - KMS key permission denied (check EBS CSI driver IAM role)

How do I resize a StatefulSet PVC without downtime?

For ReadWriteOnce EBS volumes: patch the PVC (as shown above), then perform a rolling restart — the filesystem resize happens when each pod restarts and mounts the newly sized volume. The StatefulSet handles the rolling restart automatically if you patch the PVC before updating the pod template.


For backup and disaster recovery of StatefulSet volumes and namespace resources with Velero, see Velero: Kubernetes Backup and Disaster Recovery. For the PodDisruptionBudget configuration that ensures StatefulSet availability during node drains, see Kubernetes PodDisruptionBudget and Graceful Shutdown Patterns.

Migrating stateful workloads into Kubernetes? Talk to us at Coding Protocols — we help platform teams design storage architectures that match workload I/O requirements and recovery objectives.

Related Topics

Kubernetes
StatefulSet
Storage
PersistentVolume
EBS
EFS
Platform Engineering
Databases

Read Next