Kubernetes
13 min readMay 8, 2026

Kubernetes Jobs and CronJobs: Production Patterns for Batch Workloads

Jobs and CronJobs look simple until they fail at 3am in a way your alerting didn't catch. Missed schedules, zombie jobs from a mishandled concurrencyPolicy, job history piling up silently — the failure modes are consistent and avoidable. Here's how to run batch workloads reliably in Kubernetes.

CO
Coding Protocols Team
Platform Engineering
Kubernetes Jobs and CronJobs: Production Patterns for Batch Workloads

Kubernetes Jobs run to completion. CronJobs schedule Jobs on a time-based trigger. Both are conceptually simple, but production batch workloads surface enough edge cases — what happens if a CronJob misses its schedule? what happens when two concurrent runs overlap? how do you alert on a job that succeeded but produced wrong output? — that "simple" undersells the configuration required to run them reliably.

This post covers the mechanics of Jobs and CronJobs, the failure modes to design against, and the patterns for production batch workloads from simple scheduled tasks to parallel data processing pipelines.


Job Basics

A Job creates one or more pods and ensures a specified number complete successfully:

yaml
1apiVersion: batch/v1
2kind: Job
3metadata:
4  name: db-migration
5  namespace: production
6spec:
7  completions: 1             # How many pod completions = job success (default: 1)
8  parallelism: 1             # How many pods run in parallel (default: 1)
9  backoffLimit: 3            # Retry the pod up to 3 times before marking job failed
10  activeDeadlineSeconds: 600 # Kill the job if it runs longer than 10 minutes
11  ttlSecondsAfterFinished: 3600  # Delete the job 1 hour after completion
12  template:
13    spec:
14      restartPolicy: Never   # Required for Jobs: Never or OnFailure
15      containers:
16        - name: migration
17          image: my-org/api:v2.0.0
18          command: ["python", "manage.py", "migrate"]
19          resources:
20            requests:
21              cpu: 500m
22              memory: 512Mi
23            limits:
24              memory: 1Gi

restartPolicy on Job pods must be Never or OnFailure:

  • Never: If the pod fails, a new pod is created (up to backoffLimit times). Use for idempotent jobs — each attempt is a fresh pod with clean state.
  • OnFailure: The same pod is restarted on failure. Use sparingly — the pod retains its state, which can cause issues for non-idempotent operations.

backoffLimit: Controls how many times Kubernetes retries a failing job. The default is 6, which is too high for most use cases — a job that fails immediately retries 6 times with exponential backoff (10s, 20s, 40s, 80s, 160s, 320s) before failing. Set it based on how many retries make sense for your specific job.

activeDeadlineSeconds: A hard ceiling on job runtime. When the deadline is reached, all pods are terminated and the job is marked failed with DeadlineExceeded. Essential for jobs that can hang — a migration that waits for a lock indefinitely is worse than one that fails after 10 minutes and alerts.

ttlSecondsAfterFinished: Automatically deletes the Job and its pods after completion. Without this, completed Jobs accumulate in the namespace. Set to a value that gives you time to inspect logs (3600 = 1 hour) before cleanup.


Completion Modes

Jobs have three completion modes, set via completionMode:

NonIndexed (default)

All pods are equivalent. The job succeeds when completions pods complete successfully. Any pod can re-run on failure:

yaml
spec:
  completions: 5
  parallelism: 2
  completionMode: NonIndexed  # default

5 completions, 2 at a time — any 5 successful pod runs = job done. Pods are fungible. Use for embarrassingly parallel work where any pod can process any item.

Indexed

Each pod gets a unique index (0 to completions-1) via the JOB_COMPLETION_INDEX environment variable. Use for work that needs to be partitioned across pods:

yaml
spec:
  completions: 10
  parallelism: 3
  completionMode: Indexed
bash
# In the pod, read your assigned partition:
echo $JOB_COMPLETION_INDEX   # 0-9
# Process only the items assigned to this partition index

Useful for sharded data processing — pod 0 processes records 0–999, pod 1 processes 1000–1999, etc. Each index must complete exactly once.


CronJob Configuration

yaml
1apiVersion: batch/v1
2kind: CronJob
3metadata:
4  name: nightly-report
5  namespace: production
6spec:
7  schedule: "0 2 * * *"         # 2am UTC daily
8  timeZone: "UTC"                # explicit timezone (stable since K8s 1.27)
9  concurrencyPolicy: Forbid      # Don't start if previous run is still running
10  successfulJobsHistoryLimit: 3  # Keep last 3 successful jobs
11  failedJobsHistoryLimit: 3      # Keep last 3 failed jobs
12  startingDeadlineSeconds: 300   # If missed by 5+ min, don't start
13  jobTemplate:
14    spec:
15      activeDeadlineSeconds: 3600
16      backoffLimit: 2
17      ttlSecondsAfterFinished: 7200
18      template:
19        spec:
20          restartPolicy: Never
21          containers:
22            - name: report
23              image: my-org/reports:v1.2.0
24              command: ["python", "generate_report.py"]

concurrencyPolicy

ValueBehaviour
AllowMultiple concurrent runs permitted (default)
ForbidSkip the new run if the previous is still running
ReplaceKill the running job and start a new one

Allow is dangerous for jobs that write to shared state (databases, files). If your nightly report takes 2+ hours and runs daily, Allow means two runs overlap — which can cause duplicate data, deadlocks, or corrupted output. Use Forbid for any job that shouldn't run concurrently with itself.

Replace is useful for jobs that should never be stale — a metrics aggregator that should always reflect the last 24 hours. If the previous run is stuck, replace it with a fresh run rather than waiting.

startingDeadlineSeconds

If the CronJob controller misses a scheduled start (because the controller was down, the node was unavailable, or the namespace was being created), startingDeadlineSeconds controls how long after the scheduled time the job should still be started.

yaml
startingDeadlineSeconds: 300   # If missed by more than 5 minutes, skip

Without this, a CronJob that misses many consecutive starts (e.g., after a long cluster outage) can trigger a burst of catch-up runs when the controller recovers. Kubernetes counts missed schedules — if more than 100 are missed, it logs an error and stops scheduling. Setting startingDeadlineSeconds prevents catch-up runs.

Job History Limits

yaml
successfulJobsHistoryLimit: 3   # Default: 3
failedJobsHistoryLimit: 1       # Default: 1

These control how many finished Job objects are retained. Lower limits reduce etcd load; higher limits retain more log history. 0 deletes jobs immediately on completion (logs are gone too — combine with external log aggregation). The default values are reasonable for most use cases.


Failure Handling Patterns

Idempotent Job Design

The most important pattern for reliable batch jobs: every job should be safe to run multiple times with the same result. If backoffLimit: 3 causes three retries of a job that inserted records, you should not end up with 3× the records.

Idempotency patterns:

  • Upsert rather than insert: INSERT ... ON CONFLICT DO UPDATE instead of plain INSERT
  • Checkpoint-and-resume: Track progress in a persistent store; on restart, skip completed work
  • Idempotency keys: Use a unique job-run ID as part of the record key to detect duplicates

Pod Failure Policy

Pod Failure Policy was alpha in 1.25, beta in 1.26, and GA in Kubernetes 1.31. It gives fine-grained control over which pod failure types should count against backoffLimit:

yaml
1spec:
2  backoffLimit: 6
3  podFailurePolicy:
4    rules:
5      - action: FailJob           # Immediately fail the job (no retry)
6        onExitCodes:
7          containerName: migration
8          operator: In
9          values: [42]            # Exit code 42 = "unrecoverable error, don't retry"
10      - action: Ignore            # Don't count against backoffLimit
11        onPodConditions:
12          - type: DisruptionTarget  # Node eviction (spot interruption, drain)
13      - action: Count             # Count against backoffLimit (default)
14        onExitCodes:
15          operator: NotIn
16          values: [0, 42]

This lets you differentiate between:

  • Application errors that should be retried (transient failures, timeouts) → Count
  • Application errors that should never be retried (bad input, schema mismatch) → FailJob
  • Infrastructure interruptions that should not count as failures → Ignore

Without podFailurePolicy, spot interruptions count against backoffLimit. A job running on spot nodes that gets interrupted 4 times exhausts a backoffLimit: 3 and fails — even though the code is correct. Ignore + DisruptionTarget prevents this.


Monitoring CronJobs

Standard pod metrics don't capture CronJob health. The critical metrics:

yaml
1# Prometheus alerts for CronJob failures
2groups:
3  - name: cronjobs
4    rules:
5      - alert: CronJobFailed
6        expr: |
7          kube_job_status_failed > 0
8          * on(job_name, namespace) group_left(owner_name)
9          kube_job_owner{owner_kind="CronJob"}
10        for: 0m
11        labels:
12          severity: critical
13        annotations:
14          summary: "CronJob {{ $labels.owner_name }} has failed pods"
15
16      - alert: CronJobMissedSchedule
17        expr: |
18          time() - kube_cronjob_next_schedule_time > 3600
19        for: 5m
20        labels:
21          severity: warning
22        annotations:
23          summary: "CronJob {{ $labels.cronjob }} missed its schedule"
24
25      - alert: CronJobSuspended
26        expr: kube_cronjob_spec_suspend != 0
27        for: 1h
28        labels:
29          severity: warning
30        annotations:
31          summary: "CronJob {{ $labels.cronjob }} is suspended"

kube_cronjob_next_schedule_time is the timestamp of the next scheduled run. If the current time is more than an hour past the next schedule, the job is stuck (controller issue, namespace issue, or startingDeadlineSeconds rejection).


Parallel Data Processing

For large-scale data processing, use Indexed Jobs with a work queue:

yaml
1apiVersion: batch/v1
2kind: Job
3metadata:
4  name: data-processor
5  namespace: production
6spec:
7  completions: 100       # 100 shards to process
8  parallelism: 10        # 10 workers in parallel
9  completionMode: Indexed
10  backoffLimit: 5
11  podFailurePolicy:
12    rules:
13      - action: Ignore
14        onPodConditions:
15          - type: DisruptionTarget
16  template:
17    spec:
18      restartPolicy: Never
19      tolerations:
20        - key: spot
21          operator: Equal
22          value: "true"
23          effect: NoSchedule    # Run on spot instances (batch = cost-tolerant)
24      containers:
25        - name: processor
26          image: my-org/processor:v1.0.0
27          env:
28            - name: TOTAL_SHARDS
29              value: "100"
30          command:
31            - python
32            - process.py
33            - --shard=$(JOB_COMPLETION_INDEX)
34            - --total=$(TOTAL_SHARDS)

Kubernetes automatically injects $JOB_COMPLETION_INDEX into Indexed Job containers — no fieldRef boilerplate required. Reference it directly in command or args.

This pattern works well for:

  • Reprocessing historical data in date-range shards
  • Bulk export/transformation pipelines
  • Machine learning inference on large datasets

Use spot instances for batch jobs — the DisruptionTarget ignore rule means spot interruptions don't count as failures, and retried pods simply resume from a checkpoint.


Native Sidecar Termination (K8s 1.29+)

Jobs have long struggled with sidecars — auxiliary containers (logging agents, secrets proxies, cloud credential refreshers) that don't exit when the main container finishes, causing the Job pod to hang indefinitely. The traditional workaround was pkill scripts or custom entrypoints to signal sidecars on exit.

Kubernetes 1.29 introduced native sidecar support for Jobs via restartPolicy: Always on initContainers:

yaml
1spec:
2  template:
3    spec:
4      initContainers:
5        - name: vault-proxy
6          image: vault-proxy:1.0
7          restartPolicy: Always    # This makes it a native sidecar
8      containers:
9        - name: main-worker
10          image: worker:1.2

restartPolicy: Always on an init container turns it into a native sidecar: it starts before the main container but is terminated by the kubelet as soon as the main container exits. No pkill, no lifecycle hooks, no wrapper scripts. If you're running sidecar injection for secrets (Vault agent, AWS IRSA credential refresh) or log forwarding alongside batch Jobs, upgrade to this pattern.


Idempotency Patterns

The most important requirement for production batch jobs: every job must be safe to run multiple times with the same result. With backoffLimit causing retries and CronJobs potentially catching up on missed runs, non-idempotent jobs create duplicate records, double-charges, or corrupted state.

Database — upsert instead of insert:

sql
1-- Safe to run multiple times — only inserts if no row exists for this date
2INSERT INTO reconciliation_runs (date, status, result)
3VALUES ('2026-05-09', 'pending', NULL)
4ON CONFLICT (date) DO NOTHING;
5
6-- Use a status guard to prevent double-processing
7UPDATE payment_batches
8SET status = 'processing', started_at = NOW()
9WHERE batch_date = '2026-05-09'
10AND status = 'pending';    -- Only succeeds if still pending
11-- Check rows_affected == 1 before proceeding

Message queues — deduplication with Redis:

python
1def process_payment(payment_id: str):
2    # setnx is atomic — only sets if key doesn't exist
3    if redis.setnx(f"processed:{payment_id}", "1"):
4        redis.expire(f"processed:{payment_id}", 86400)  # 24h TTL
5        # Actually process the payment
6    else:
7        logger.info(f"Skipping already-processed payment {payment_id}")

For SQS FIFO queues, use message deduplication IDs — SQS will deduplicate within a 5-minute window. For Kafka, use consumer group offsets with manual commit after successful processing so failures replay from the last committed offset.


Common Anti-Patterns

No activeDeadlineSeconds: A job that hangs indefinitely holds cluster resources and doesn't alert. Always set a reasonable deadline.

backoffLimit: 6 (default) for non-idempotent jobs: Six retries of a job that creates records creates 6× the records. Set backoffLimit based on your idempotency guarantees.

concurrencyPolicy: Allow for stateful jobs: Concurrent runs of a report generator create duplicate reports. Use Forbid for any job that shouldn't overlap.

No ttlSecondsAfterFinished: Completed Jobs accumulate indefinitely. With high-frequency CronJobs, thousands of completed Job objects fill etcd. Set TTL.

Missing resource requests: Jobs without resource requests can be scheduled on nodes with no capacity, causing them to fail or get evicted. Always set requests on Job pods, same as for Deployment pods.

restartPolicy: Always: Not valid for Jobs — Kubernetes will reject the pod spec. Never or OnFailure only.


Frequently Asked Questions

How do I run a one-off job from a CronJob template?

bash
1# Create a manual job from an existing CronJob template
2kubectl create job manual-run \
3  --from=cronjob/nightly-report \
4  -n production
5
6# Watch it
7kubectl get pods -n production -l job-name=manual-run --watch

How does CronJob handle missed runs?

startingDeadlineSeconds defines a rolling backfill window. When the CronJob controller resumes, it counts missed runs within the window and starts them (subject to concurrencyPolicy). If more than 100 runs were missed within the window — a hard-coded Kubernetes limit — the CronJob logs an error and stops scheduling entirely until the next scheduled trigger succeeds. Setting startingDeadlineSeconds prevents catch-up run bursts after long outages.

For jobs that must process every scheduled interval (daily reports, SLA-bound reconciliation), don't rely on CronJob backfill — build explicit catch-up logic in the job itself that checks for unprocessed intervals and handles gaps deterministically.

Can a CronJob be triggered externally (webhook, event)?

CronJobs are time-triggered only. For event-driven triggers, use KEDA's ScaledJob — it creates Jobs in response to external events (queue depth, HTTP endpoint, Prometheus metric) rather than on a schedule. See KEDA: Event-Driven Autoscaling for Kubernetes.

How do I pass secrets to a Job?

Same as any pod — via environment variables from Secrets, or volume mounts. For per-job credentials (e.g., a one-time S3 access token), use IRSA/Pod Identity so the job pod gets cloud credentials from its service account rather than a long-lived secret.

How do I handle job output (reports, files)?

Options:

  1. Object storage: Write output to S3/GCS directly from the job. The job knows the output location by convention or via environment variable.
  2. Database: Write results to a database table with the job run ID.
  3. Kubernetes ConfigMap: For small outputs only (< 1MB limit) — write results to a ConfigMap for consumption by another process.

Avoid writing output to PVCs for batch jobs — PVCs add lifecycle management complexity. Object storage is simpler and cheaper for batch output.


For KEDA event-driven job scaling (scale Jobs based on queue depth), see KEDA: Event-Driven Autoscaling for Kubernetes. For cost-optimised batch processing on spot instances, see Kubernetes Cost Optimisation. For monitoring CronJob execution with Prometheus — including alerts for failed jobs, missed schedules, and stuck runs — see Kubernetes Observability: Prometheus, Grafana, and OpenTelemetry.

Building a batch processing pipeline on Kubernetes? Talk to us at Coding Protocols — we help platform teams design job architectures that handle failures, retries, and cost optimisation correctly.

Related Topics

Kubernetes
Jobs
CronJobs
Batch Processing
Platform Engineering
Reliability
Production

Read Next