Debugging CrashLoopBackOff from Scratch
CrashLoopBackOff means your container keeps crashing and Kubernetes keeps restarting it. This tutorial gives you a systematic approach to find the root cause every time — no guessing.
Before you begin
- kubectl configured against a running cluster
- Basic understanding of Kubernetes Pods
CrashLoopBackOff is Kubernetes telling you: "your container started, crashed, I restarted it, it crashed again — and I'm going to keep trying with increasing delays." The container isn't broken from Kubernetes's perspective. It's just consistently failing.
The backoff timer starts at 10 seconds and doubles: 10s → 20s → 40s → 80s → 160s → 300s (max). That's why a pod can go from Error to CrashLoopBackOff — it's crashed enough times that the delay is noticeable.
Here's the systematic approach to finding the root cause.
Step 1: Confirm the State
kubectl get pod <pod-name>
# NAME READY STATUS RESTARTS AGE
# my-app-xyz 0/1 CrashLoopBackOff 5 4m
Note the RESTARTS count. A pod that's been restarting for hours with 50+ restarts is a different problem from one that just started crashing.
Step 2: Read the Logs
This is always step two:
kubectl logs <pod-name>
If the pod has already restarted, the current logs might be empty (the container crashed before writing anything). Get the previous run's logs:
kubectl logs <pod-name> --previous
Read these carefully. Most crashes leave a clear error message: missing environment variable, can't connect to database, permission denied on a file, OOM kill.
Step 3: Describe the Pod
kubectl describe gives you the event history and the exit code:
kubectl describe pod <pod-name>
Look at two sections:
State / Last State:
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 22 Apr 2026 10:00:00
Finished: Mon, 22 Apr 2026 10:00:05
The exit code tells you a lot:
0— container exited cleanly (your process is finishing instead of running)1— general error (check logs)137— killed with SIGKILL, usually OOM (out of memory)139— segmentation fault143— killed with SIGTERM (graceful shutdown signal, shouldn't loop)
Events:
Events:
Warning BackOff 2m kubelet Back-off restarting failed container
Warning Failed 5m kubelet Error: failed to create containerd task:
Events show infrastructure-level failures like image pull errors, missing secrets, or OOM kills.
Step 4: The Six Most Common Root Causes
1. Application error on startup
The app crashes immediately due to a bug, missing dependency, or bad config.
Symptoms: Exit code 1, logs show a stack trace or error message.
Fix: Read the logs from --previous. Fix the application error.
2. Missing or wrong environment variable
The app reads a required env var on startup, doesn't find it, and exits.
# Check what env vars the container is getting
kubectl exec <pod-name> -- env 2>/dev/null || \
kubectl describe pod <pod-name> | grep -A 20 "Environment:"
Fix: Add the missing env var to the deployment:
kubectl set env deployment/my-app DATABASE_URL=postgres://...
3. Can't connect to a dependency
The app tries to connect to a database or external service at startup, fails, and exits instead of retrying.
# Check if the service is reachable from inside the cluster
kubectl run debug --rm -it --image=busybox -- \
nc -zv postgres.default.svc.cluster.local 5432
Fix: Either fix the dependency (is the database running? is the Secret correct?) or make the application retry with backoff instead of exiting.
4. Out of memory (exit code 137)
The container hits its memory limit and gets killed by the OOM killer.
kubectl describe pod <pod-name> | grep -A 5 "OOMKilled\|137"
Fix: Increase the memory limit in the deployment:
kubectl set resources deployment/my-app \
--limits=memory=512Mi \
--requests=memory=256Mi
5. Command or entrypoint error
The container command doesn't exist, has wrong arguments, or the working directory is wrong.
# Check what command the pod is running
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].command}'
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].args}'
Fix: Override the command to drop into a shell and investigate:
kubectl debug -it <pod-name> --image=busybox --copy-to=debug-pod -- sh
6. Missing volume or ConfigMap
The app expects a file or volume that doesn't exist.
kubectl describe pod <pod-name> | grep -A 10 "Volumes\|Mounts"
# Look for "Warning: MountVolume.SetUp failed"
Fix: Create the missing ConfigMap or Secret, or fix the volume mount path.
Step 5: Override the Command for Interactive Debugging
When logs don't tell you enough, override the container command to keep it running so you can exec in:
kubectl patch deployment my-app --type=json -p='[
{"op": "replace", "path": "/spec/template/spec/containers/0/command", "value": ["sleep", "3600"]},
{"op": "replace", "path": "/spec/template/spec/containers/0/args", "value": []}
]'
# Wait for the new pod
kubectl get pods -w
# Exec into it
kubectl exec -it <new-pod-name> -- sh
# Inside: run your original command manually and see the error
/app/server --config /etc/app/config.yaml
After debugging, remove the override:
kubectl patch deployment my-app --type=json -p='[
{"op": "remove", "path": "/spec/template/spec/containers/0/command"},
{"op": "remove", "path": "/spec/template/spec/containers/0/args"}
]'
Step 6: Use kubectl debug (Kubernetes 1.23+)
kubectl debug creates a copy of a crashing pod with a different image, without modifying the deployment:
kubectl debug -it <pod-name> \
--image=ubuntu:22.04 \
--copy-to=debug-pod \
--share-processes \
-- bash
Inside, you can inspect the filesystem, run the application binary manually, and check environment variables — all without touching the running deployment.
Quick Reference
| Exit Code | Likely Cause |
|---|---|
| 0 | Process exited normally — check if it should be long-running |
| 1 | App error — check logs |
| 137 | OOM kill — increase memory limit |
| 139 | Segfault — application bug |
| 143 | SIGTERM received — check why it's not handling graceful shutdown |
| 255 | Unknown error — check logs |
We built Podscape to simplify Kubernetes workflows like this — logs, events, and cluster state in one interface, without switching tools.
Struggling with this in production?
We help teams fix these exact issues. Our engineers have deployed these patterns across production environments at scale.