Kubernetes
12 min readMay 9, 2026

Kubernetes PodDisruptionBudget and Graceful Shutdown Patterns

Two things kill Kubernetes production reliability during maintenance: voluntary disruptions (node drains, rolling updates) that evict too many pods at once, and pods that receive SIGTERM and die immediately without finishing in-flight requests. PodDisruptionBudget and graceful shutdown hooks solve both.

CO
Coding Protocols Team
Platform Engineering
Kubernetes PodDisruptionBudget and Graceful Shutdown Patterns

Every time you upgrade a cluster, Karpenter consolidates nodes, or an Ingress controller pod restarts, Kubernetes needs to evict pods. Without a PodDisruptionBudget, it may evict all replicas of a service simultaneously — causing a complete service outage during what should be a routine maintenance operation.

Two separate problems compound this: eviction timing (too many pods evicted at once) and shutdown behaviour (evicted pods don't finish in-flight requests before terminating). Both need to be solved, and they're solved differently.


Voluntary vs Involuntary Disruptions

Involuntary disruptions: Node hardware failure, kernel panic, cloud provider preemption. Kubernetes can't prevent these — the pod just dies.

Voluntary disruptions: kubectl drain, rolling updates, Cluster Autoscaler scale-down, Karpenter consolidation, spot interruption handler drain. Kubernetes respects PodDisruptionBudget during voluntary disruptions.

PodDisruptionBudget only protects against voluntary disruptions. For involuntary, the answer is multiple replicas across multiple nodes — PDB combined with anti-affinity.


PodDisruptionBudget

yaml
1apiVersion: policy/v1
2kind: PodDisruptionBudget
3metadata:
4  name: payments-api-pdb
5  namespace: production
6spec:
7  # Option A: minimum available (absolute or percentage)
8  minAvailable: 2           # At least 2 pods must remain available
9  # minAvailable: "75%"     # At least 75% of desired replicas
10
11  # Option B: maximum unavailable (use one or the other, not both)
12  # maxUnavailable: 1        # At most 1 pod can be unavailable
13  # maxUnavailable: "25%"
14
15  selector:
16    matchLabels:
17      app: payments-api

minAvailable: 2 with 3 replicas: at most 1 pod can be evicted at a time. With 5 replicas: at most 3 can be evicted. As the deployment scales, PDB tracks the absolute minimum.

maxUnavailable: 1 means exactly 1 pod can be disrupted at a time, regardless of deployment size. This is stricter for small deployments but more permissive for large ones.

bash
1# Check PDB status
2kubectl get pdb -n production
3# NAME                MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
4# payments-api-pdb    2               N/A               1                     5d
5
6# ALLOWED DISRUPTIONS shows how many pods can be evicted right now
7# (current replicas - minAvailable = allowed)

Verifying PDB enforcement:

bash
1# Drain a node — should respect PDB
2kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data
3
4# If PDB blocks the drain (which is correct behaviour):
5# node/node-1 cordoned
6# evicting pod production/payments-api-xxxxx
7# error when evicting pods/"payments-api-xxxxx" -n "production":
8# Cannot evict pod as it would violate the pod's disruption budget.
9
10# PDB enforcement means you must wait for the deployment to schedule
11# a replacement before the next pod can be evicted

unhealthyPodEvictionPolicy

By default, PDB doesn't allow evicting unhealthy pods — if some pods are already down due to crashes, PDB blocks eviction even of unrelated pods to preserve availability. In Kubernetes 1.27+, you can change this:

yaml
spec:
  minAvailable: 2
  unhealthyPodEvictionPolicy: AlwaysAllow   # Allow evicting unhealthy pods
  # Default: IfHealthyBudget — evict unhealthy pods only when the disruption budget has room

AlwaysAllow prevents a failed pod from blocking a node drain indefinitely — safer for maintenance workflows.


The Pod Termination Sequence

When a pod is evicted or a rolling update removes it, Kubernetes follows this sequence:

1. Pod status → Terminating (removed from Service endpoints)
   └─ terminationGracePeriodSeconds countdown begins (default: 30s)
2. preStop hook executes (if configured) — consumes from the grace period
3. SIGTERM sent to all containers — remaining grace period continues
4. If containers still running after countdown expires: SIGKILL

The gap between step 1 (endpoint removal) and when the load balancer actually stops routing traffic is the source of connection errors during rolling updates. kube-proxy and cloud load balancers update asynchronously — a pod can receive new connections seconds after it's been removed from endpoint lists.


preStop Hook

The preStop hook runs before SIGTERM and can delay termination to allow in-flight connections to drain:

yaml
1containers:
2  - name: api
3    lifecycle:
4      preStop:
5        exec:
6          command: ["/bin/sh", "-c", "sleep 5"]  # Wait 5s for load balancer to stop routing
7          # During these 5 seconds:
8          # - The pod is still running and serving requests
9          # - The endpoint has been removed from the Service
10          # - Load balancers are propagating the endpoint removal
11          # After sleep: SIGTERM is sent; application handles in-flight requests and exits

A 5-second sleep in preStop is sufficient for most Kubernetes networking layers to propagate the endpoint removal before the application starts refusing new connections.

For applications that need explicit graceful shutdown:

yaml
1lifecycle:
2  preStop:
3    exec:
4      command:
5        - /bin/sh
6        - -c
7        - |
8          # Signal the application to stop accepting new connections
9          kill -SIGUSR1 1
10          # Wait for in-flight requests to complete (up to 30s)
11          sleep 30

Or use an HTTP endpoint if your application supports it:

yaml
lifecycle:
  preStop:
    httpGet:
      path: /shutdown
      port: 8080

terminationGracePeriodSeconds

terminationGracePeriodSeconds is the time Kubernetes waits after sending SIGTERM before forcibly killing the container. Default is 30 seconds.

yaml
1spec:
2  terminationGracePeriodSeconds: 60    # Long-running requests (database migrations, large uploads)
3  containers:
4    - name: api
5      lifecycle:
6        preStop:
7          exec:
8            command: ["sleep", "5"]    # 5s for endpoint propagation
9      # Application should handle SIGTERM and exit within the remaining ~55s
10      # (terminationGracePeriodSeconds includes preStop execution time)

The total shutdown budget:

terminationGracePeriodSeconds = preStop execution time + SIGTERM handling time

If terminationGracePeriodSeconds: 30 and preStop takes 10 seconds, the application has 20 seconds to finish in-flight requests after receiving SIGTERM.

For batch jobs or long-running processes that need more time:

yaml
terminationGracePeriodSeconds: 300   # 5 minutes for batch jobs

For fast, stateless microservices where 30 seconds is excessive:

yaml
terminationGracePeriodSeconds: 10    # Faster cluster operations

Application-Level Graceful Shutdown

The application must respond to SIGTERM by stopping new request intake and draining in-flight requests:

go
1// Go: graceful HTTP server shutdown
2srv := &http.Server{Addr: ":8080", Handler: mux}
3
4quit := make(chan os.Signal, 1)
5signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT)
6
7go func() {
8    if err := srv.ListenAndServe(); err != http.ErrServerClosed {
9        log.Fatal(err)
10    }
11}()
12
13<-quit   // Block until signal received
14
15ctx, cancel := context.WithTimeout(context.Background(), 25*time.Second)
16defer cancel()
17
18// Gracefully shut down: stop accepting new connections, finish in-flight
19if err := srv.Shutdown(ctx); err != nil {
20    log.Fatal("Server forced to shutdown:", err)
21}
python
1# Python (FastAPI/uvicorn): graceful shutdown via lifespan
2from contextlib import asynccontextmanager
3from fastapi import FastAPI
4
5@asynccontextmanager
6async def lifespan(app: FastAPI):
7    yield  # Startup
8    # Shutdown: uvicorn handles SIGTERM → calls lifespan cleanup
9    await drain_connections()
10
11app = FastAPI(lifespan=lifespan)

Rolling Update Interaction with PDB

Rolling updates and PDB interact: if your Deployment has maxUnavailable: 1 in its strategy and your PDB has minAvailable: N-1, they're compatible — the rolling update proceeds one pod at a time, PDB allows it.

If your Deployment has maxUnavailable: 25% (aggressive rollout) and your PDB has minAvailable: 90%, the Deployment strategy is more aggressive than PDB allows — the rolling update will proceed at PDB-constrained pace, potentially slower than the strategy implies.

A production-safe combination:

yaml
1# Deployment strategy
2strategy:
3  type: RollingUpdate
4  rollingUpdate:
5    maxUnavailable: 0     # Never remove old pods before new ones are ready
6    maxSurge: 1           # One extra pod during update
7
8# PDB
9spec:
10  maxUnavailable: 0       # No disruptions allowed (only surge replacements)
11  # Or equivalently:
12  minAvailable: 100%

maxUnavailable: 0 in the Deployment combined with minAvailable: 100% in the PDB means no requests are dropped during rolling updates — new pods must be Ready before old ones terminate.


Frequently Asked Questions

Does PDB protect against Karpenter node consolidation?

Yes. When Karpenter consolidates nodes (consolidationPolicy: WhenEmptyOrUnderutilized), it calls the Kubernetes eviction API for each pod. The eviction API respects PDB — if evicting a pod would violate the PDB, Karpenter backs off and retries later. Configure a disruption.budgets in your Karpenter NodePool to control the rate of consolidation alongside PDB.

What should minAvailable be for different services?

  • Stateless APIs with ≥3 replicas: minAvailable: 2 or maxUnavailable: 1
  • Single-replica non-critical workloads: No PDB (or minAvailable: 0 which is equivalent to no PDB)
  • Critical services (auth, API gateway): minAvailable: 75% or higher
  • StatefulSets (databases, caches): PDB matching quorum size — for a 3-node cluster, minAvailable: 2 preserves quorum during maintenance

My node drain is stuck — PDB is blocking it indefinitely

bash
1# Check which PDB is blocking
2kubectl get pdb -A
3
4# Check which pods have the PDB applied and their status
5kubectl describe pdb payments-api-pdb -n production
6
7# If pods are crashing (not the PDB's fault), delete the broken pod manually
8# to restore disruption budget, then retry the drain

If a pod is stuck in CrashLoopBackOff with a PDB minAvailable that prevents eviction, you may need to temporarily patch the PDB (minAvailable: 0) to proceed with maintenance, then restore it.


For how PDB interacts with Karpenter consolidation and node management, see Kubernetes Cost Optimisation: Spot Instances, VPA, and Karpenter. For the Argo Rollouts progressive delivery integration that uses PDB during canary releases, see Argo Rollouts: Progressive Delivery.

Configuring production-grade disruption protection for a Kubernetes workload? Talk to us at Coding Protocols — we help platform teams implement graceful shutdown patterns and disruption budgets that eliminate maintenance-window service degradation.

Related Topics

Kubernetes
PodDisruptionBudget
Graceful Shutdown
Reliability
Platform Engineering
SRE

Read Next