Kubernetes PodDisruptionBudget and Graceful Shutdown Patterns
Two things kill Kubernetes production reliability during maintenance: voluntary disruptions (node drains, rolling updates) that evict too many pods at once, and pods that receive SIGTERM and die immediately without finishing in-flight requests. PodDisruptionBudget and graceful shutdown hooks solve both.

Every time you upgrade a cluster, Karpenter consolidates nodes, or an Ingress controller pod restarts, Kubernetes needs to evict pods. Without a PodDisruptionBudget, it may evict all replicas of a service simultaneously — causing a complete service outage during what should be a routine maintenance operation.
Two separate problems compound this: eviction timing (too many pods evicted at once) and shutdown behaviour (evicted pods don't finish in-flight requests before terminating). Both need to be solved, and they're solved differently.
Voluntary vs Involuntary Disruptions
Involuntary disruptions: Node hardware failure, kernel panic, cloud provider preemption. Kubernetes can't prevent these — the pod just dies.
Voluntary disruptions: kubectl drain, rolling updates, Cluster Autoscaler scale-down, Karpenter consolidation, spot interruption handler drain. Kubernetes respects PodDisruptionBudget during voluntary disruptions.
PodDisruptionBudget only protects against voluntary disruptions. For involuntary, the answer is multiple replicas across multiple nodes — PDB combined with anti-affinity.
PodDisruptionBudget
1apiVersion: policy/v1
2kind: PodDisruptionBudget
3metadata:
4 name: payments-api-pdb
5 namespace: production
6spec:
7 # Option A: minimum available (absolute or percentage)
8 minAvailable: 2 # At least 2 pods must remain available
9 # minAvailable: "75%" # At least 75% of desired replicas
10
11 # Option B: maximum unavailable (use one or the other, not both)
12 # maxUnavailable: 1 # At most 1 pod can be unavailable
13 # maxUnavailable: "25%"
14
15 selector:
16 matchLabels:
17 app: payments-apiminAvailable: 2 with 3 replicas: at most 1 pod can be evicted at a time. With 5 replicas: at most 3 can be evicted. As the deployment scales, PDB tracks the absolute minimum.
maxUnavailable: 1 means exactly 1 pod can be disrupted at a time, regardless of deployment size. This is stricter for small deployments but more permissive for large ones.
1# Check PDB status
2kubectl get pdb -n production
3# NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
4# payments-api-pdb 2 N/A 1 5d
5
6# ALLOWED DISRUPTIONS shows how many pods can be evicted right now
7# (current replicas - minAvailable = allowed)Verifying PDB enforcement:
1# Drain a node — should respect PDB
2kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data
3
4# If PDB blocks the drain (which is correct behaviour):
5# node/node-1 cordoned
6# evicting pod production/payments-api-xxxxx
7# error when evicting pods/"payments-api-xxxxx" -n "production":
8# Cannot evict pod as it would violate the pod's disruption budget.
9
10# PDB enforcement means you must wait for the deployment to schedule
11# a replacement before the next pod can be evictedunhealthyPodEvictionPolicy
By default, PDB doesn't allow evicting unhealthy pods — if some pods are already down due to crashes, PDB blocks eviction even of unrelated pods to preserve availability. In Kubernetes 1.27+, you can change this:
spec:
minAvailable: 2
unhealthyPodEvictionPolicy: AlwaysAllow # Allow evicting unhealthy pods
# Default: IfHealthyBudget — evict unhealthy pods only when the disruption budget has roomAlwaysAllow prevents a failed pod from blocking a node drain indefinitely — safer for maintenance workflows.
The Pod Termination Sequence
When a pod is evicted or a rolling update removes it, Kubernetes follows this sequence:
1. Pod status → Terminating (removed from Service endpoints)
└─ terminationGracePeriodSeconds countdown begins (default: 30s)
2. preStop hook executes (if configured) — consumes from the grace period
3. SIGTERM sent to all containers — remaining grace period continues
4. If containers still running after countdown expires: SIGKILL
The gap between step 1 (endpoint removal) and when the load balancer actually stops routing traffic is the source of connection errors during rolling updates. kube-proxy and cloud load balancers update asynchronously — a pod can receive new connections seconds after it's been removed from endpoint lists.
preStop Hook
The preStop hook runs before SIGTERM and can delay termination to allow in-flight connections to drain:
1containers:
2 - name: api
3 lifecycle:
4 preStop:
5 exec:
6 command: ["/bin/sh", "-c", "sleep 5"] # Wait 5s for load balancer to stop routing
7 # During these 5 seconds:
8 # - The pod is still running and serving requests
9 # - The endpoint has been removed from the Service
10 # - Load balancers are propagating the endpoint removal
11 # After sleep: SIGTERM is sent; application handles in-flight requests and exitsA 5-second sleep in preStop is sufficient for most Kubernetes networking layers to propagate the endpoint removal before the application starts refusing new connections.
For applications that need explicit graceful shutdown:
1lifecycle:
2 preStop:
3 exec:
4 command:
5 - /bin/sh
6 - -c
7 - |
8 # Signal the application to stop accepting new connections
9 kill -SIGUSR1 1
10 # Wait for in-flight requests to complete (up to 30s)
11 sleep 30Or use an HTTP endpoint if your application supports it:
lifecycle:
preStop:
httpGet:
path: /shutdown
port: 8080terminationGracePeriodSeconds
terminationGracePeriodSeconds is the time Kubernetes waits after sending SIGTERM before forcibly killing the container. Default is 30 seconds.
1spec:
2 terminationGracePeriodSeconds: 60 # Long-running requests (database migrations, large uploads)
3 containers:
4 - name: api
5 lifecycle:
6 preStop:
7 exec:
8 command: ["sleep", "5"] # 5s for endpoint propagation
9 # Application should handle SIGTERM and exit within the remaining ~55s
10 # (terminationGracePeriodSeconds includes preStop execution time)The total shutdown budget:
terminationGracePeriodSeconds = preStop execution time + SIGTERM handling time
If terminationGracePeriodSeconds: 30 and preStop takes 10 seconds, the application has 20 seconds to finish in-flight requests after receiving SIGTERM.
For batch jobs or long-running processes that need more time:
terminationGracePeriodSeconds: 300 # 5 minutes for batch jobsFor fast, stateless microservices where 30 seconds is excessive:
terminationGracePeriodSeconds: 10 # Faster cluster operationsApplication-Level Graceful Shutdown
The application must respond to SIGTERM by stopping new request intake and draining in-flight requests:
1// Go: graceful HTTP server shutdown
2srv := &http.Server{Addr: ":8080", Handler: mux}
3
4quit := make(chan os.Signal, 1)
5signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT)
6
7go func() {
8 if err := srv.ListenAndServe(); err != http.ErrServerClosed {
9 log.Fatal(err)
10 }
11}()
12
13<-quit // Block until signal received
14
15ctx, cancel := context.WithTimeout(context.Background(), 25*time.Second)
16defer cancel()
17
18// Gracefully shut down: stop accepting new connections, finish in-flight
19if err := srv.Shutdown(ctx); err != nil {
20 log.Fatal("Server forced to shutdown:", err)
21}1# Python (FastAPI/uvicorn): graceful shutdown via lifespan
2from contextlib import asynccontextmanager
3from fastapi import FastAPI
4
5@asynccontextmanager
6async def lifespan(app: FastAPI):
7 yield # Startup
8 # Shutdown: uvicorn handles SIGTERM → calls lifespan cleanup
9 await drain_connections()
10
11app = FastAPI(lifespan=lifespan)Rolling Update Interaction with PDB
Rolling updates and PDB interact: if your Deployment has maxUnavailable: 1 in its strategy and your PDB has minAvailable: N-1, they're compatible — the rolling update proceeds one pod at a time, PDB allows it.
If your Deployment has maxUnavailable: 25% (aggressive rollout) and your PDB has minAvailable: 90%, the Deployment strategy is more aggressive than PDB allows — the rolling update will proceed at PDB-constrained pace, potentially slower than the strategy implies.
A production-safe combination:
1# Deployment strategy
2strategy:
3 type: RollingUpdate
4 rollingUpdate:
5 maxUnavailable: 0 # Never remove old pods before new ones are ready
6 maxSurge: 1 # One extra pod during update
7
8# PDB
9spec:
10 maxUnavailable: 0 # No disruptions allowed (only surge replacements)
11 # Or equivalently:
12 minAvailable: 100%maxUnavailable: 0 in the Deployment combined with minAvailable: 100% in the PDB means no requests are dropped during rolling updates — new pods must be Ready before old ones terminate.
Frequently Asked Questions
Does PDB protect against Karpenter node consolidation?
Yes. When Karpenter consolidates nodes (consolidationPolicy: WhenEmptyOrUnderutilized), it calls the Kubernetes eviction API for each pod. The eviction API respects PDB — if evicting a pod would violate the PDB, Karpenter backs off and retries later. Configure a disruption.budgets in your Karpenter NodePool to control the rate of consolidation alongside PDB.
What should minAvailable be for different services?
- Stateless APIs with ≥3 replicas:
minAvailable: 2ormaxUnavailable: 1 - Single-replica non-critical workloads: No PDB (or
minAvailable: 0which is equivalent to no PDB) - Critical services (auth, API gateway):
minAvailable: 75%or higher - StatefulSets (databases, caches): PDB matching quorum size — for a 3-node cluster,
minAvailable: 2preserves quorum during maintenance
My node drain is stuck — PDB is blocking it indefinitely
1# Check which PDB is blocking
2kubectl get pdb -A
3
4# Check which pods have the PDB applied and their status
5kubectl describe pdb payments-api-pdb -n production
6
7# If pods are crashing (not the PDB's fault), delete the broken pod manually
8# to restore disruption budget, then retry the drainIf a pod is stuck in CrashLoopBackOff with a PDB minAvailable that prevents eviction, you may need to temporarily patch the PDB (minAvailable: 0) to proceed with maintenance, then restore it.
For how PDB interacts with Karpenter consolidation and node management, see Kubernetes Cost Optimisation: Spot Instances, VPA, and Karpenter. For the Argo Rollouts progressive delivery integration that uses PDB during canary releases, see Argo Rollouts: Progressive Delivery.
Configuring production-grade disruption protection for a Kubernetes workload? Talk to us at Coding Protocols — we help platform teams implement graceful shutdown patterns and disruption budgets that eliminate maintenance-window service degradation.


