Kubernetes

Debugging CrashLoopBackOff from Scratch

Beginner20 min to complete9 min read

CrashLoopBackOff means your container keeps crashing and Kubernetes keeps restarting it. This tutorial gives you a systematic approach to find the root cause every time — no guessing.

Before you begin

  • kubectl configured against a running cluster
  • Basic understanding of Kubernetes Pods
Kubernetes
Debugging
CrashLoopBackOff
kubectl
Troubleshooting

CrashLoopBackOff is Kubernetes telling you: "your container started, crashed, I restarted it, it crashed again — and I'm going to keep trying with increasing delays." The container isn't broken from Kubernetes's perspective. It's just consistently failing.

The backoff timer starts at 10 seconds and doubles: 10s → 20s → 40s → 80s → 160s → 300s (max). That's why a pod can go from Error to CrashLoopBackOff — it's crashed enough times that the delay is noticeable.

Here's the systematic approach to finding the root cause.

Step 1: Confirm the State

bash
kubectl get pod <pod-name>
# NAME         READY   STATUS             RESTARTS   AGE
# my-app-xyz   0/1     CrashLoopBackOff   5          4m

Note the RESTARTS count. A pod that's been restarting for hours with 50+ restarts is a different problem from one that just started crashing.

Step 2: Read the Logs

This is always step two:

bash
kubectl logs <pod-name>

If the pod has already restarted, the current logs might be empty (the container crashed before writing anything). Get the previous run's logs:

bash
kubectl logs <pod-name> --previous

Read these carefully. Most crashes leave a clear error message: missing environment variable, can't connect to database, permission denied on a file, OOM kill.

Step 3: Describe the Pod

kubectl describe gives you the event history and the exit code:

bash
kubectl describe pod <pod-name>

Look at two sections:

State / Last State:

Last State:  Terminated
  Reason:    Error
  Exit Code: 1
  Started:   Mon, 22 Apr 2026 10:00:00
  Finished:  Mon, 22 Apr 2026 10:00:05

The exit code tells you a lot:

  • 0 — container exited cleanly (your process is finishing instead of running)
  • 1 — general error (check logs)
  • 137 — killed with SIGKILL, usually OOM (out of memory)
  • 139 — segmentation fault
  • 143 — killed with SIGTERM (graceful shutdown signal, shouldn't loop)

Events:

Events:
  Warning  BackOff    2m   kubelet  Back-off restarting failed container
  Warning  Failed     5m   kubelet  Error: failed to create containerd task:

Events show infrastructure-level failures like image pull errors, missing secrets, or OOM kills.

Step 4: The Six Most Common Root Causes

1. Application error on startup

The app crashes immediately due to a bug, missing dependency, or bad config.

Symptoms: Exit code 1, logs show a stack trace or error message.

Fix: Read the logs from --previous. Fix the application error.

2. Missing or wrong environment variable

The app reads a required env var on startup, doesn't find it, and exits.

bash
# Check what env vars the container is getting
kubectl exec <pod-name> -- env 2>/dev/null || \
  kubectl describe pod <pod-name> | grep -A 20 "Environment:"

Fix: Add the missing env var to the deployment:

bash
kubectl set env deployment/my-app DATABASE_URL=postgres://...

3. Can't connect to a dependency

The app tries to connect to a database or external service at startup, fails, and exits instead of retrying.

bash
# Check if the service is reachable from inside the cluster
kubectl run debug --rm -it --image=busybox -- \
  nc -zv postgres.default.svc.cluster.local 5432

Fix: Either fix the dependency (is the database running? is the Secret correct?) or make the application retry with backoff instead of exiting.

4. Out of memory (exit code 137)

The container hits its memory limit and gets killed by the OOM killer.

bash
kubectl describe pod <pod-name> | grep -A 5 "OOMKilled\|137"

Fix: Increase the memory limit in the deployment:

bash
kubectl set resources deployment/my-app \
  --limits=memory=512Mi \
  --requests=memory=256Mi

5. Command or entrypoint error

The container command doesn't exist, has wrong arguments, or the working directory is wrong.

bash
# Check what command the pod is running
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].command}'
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].args}'

Fix: Override the command to drop into a shell and investigate:

bash
kubectl debug -it <pod-name> --image=busybox --copy-to=debug-pod -- sh

6. Missing volume or ConfigMap

The app expects a file or volume that doesn't exist.

bash
kubectl describe pod <pod-name> | grep -A 10 "Volumes\|Mounts"
# Look for "Warning: MountVolume.SetUp failed"

Fix: Create the missing ConfigMap or Secret, or fix the volume mount path.

Step 5: Override the Command for Interactive Debugging

When logs don't tell you enough, override the container command to keep it running so you can exec in:

bash
kubectl patch deployment my-app --type=json -p='[
  {"op": "replace", "path": "/spec/template/spec/containers/0/command", "value": ["sleep", "3600"]},
  {"op": "replace", "path": "/spec/template/spec/containers/0/args", "value": []}
]'

# Wait for the new pod
kubectl get pods -w

# Exec into it
kubectl exec -it <new-pod-name> -- sh

# Inside: run your original command manually and see the error
/app/server --config /etc/app/config.yaml

After debugging, remove the override:

bash
kubectl patch deployment my-app --type=json -p='[
  {"op": "remove", "path": "/spec/template/spec/containers/0/command"},
  {"op": "remove", "path": "/spec/template/spec/containers/0/args"}
]'

Step 6: Use kubectl debug (Kubernetes 1.23+)

kubectl debug creates a copy of a crashing pod with a different image, without modifying the deployment:

bash
kubectl debug -it <pod-name> \
  --image=ubuntu:22.04 \
  --copy-to=debug-pod \
  --share-processes \
  -- bash

Inside, you can inspect the filesystem, run the application binary manually, and check environment variables — all without touching the running deployment.

Quick Reference

Exit CodeLikely Cause
0Process exited normally — check if it should be long-running
1App error — check logs
137OOM kill — increase memory limit
139Segfault — application bug
143SIGTERM received — check why it's not handling graceful shutdown
255Unknown error — check logs

We built Podscape to simplify Kubernetes workflows like this — logs, events, and cluster state in one interface, without switching tools.

Struggling with this in production?

We help teams fix these exact issues. Our engineers have deployed these patterns across production environments at scale.