Kubernetes
12 min readMay 8, 2026

KEDA ScaledJob: Event-Driven Batch Processing on Kubernetes

ScaledJob is KEDA's answer for batch workloads — it creates a new Kubernetes Job for each event (or batch of events) from a queue, rather than scaling long-running pod replicas. The mental model is different from ScaledObject, and the failure modes are different too.

AJ
Ajeet Yadav
Platform & Cloud Engineer
KEDA ScaledJob: Event-Driven Batch Processing on Kubernetes

Most Kubernetes batch workloads start the same way: a CronJob fires every 15 minutes, pulls items from a queue, processes them, and exits. That works fine until the queue depth becomes unpredictable. When your image-processing queue sits at zero most of the day and then spikes to 10,000 items in 90 seconds, a CronJob can't respond fast enough. You either over-provision workers that sit idle 90% of the time, or you build custom controller logic to watch the queue and spawn jobs.

KEDA's ScaledJob eliminates that custom controller. Install KEDA 2.x, define a ScaledJob resource pointing at your SQS queue (or RabbitMQ, Kafka, Redis, or any of KEDA's 60+ scalers), and KEDA creates a new Kubernetes Job for each queue item — or batch of items — automatically. When the queue drains, no new jobs are created and nothing runs. You pay for exactly the compute you need, and each work item gets its own isolated Job with its own completion record.

The catch: ScaledJob is meaningfully different from ScaledObject. If you come to it expecting the HPA-style replica scaling you already know, you'll hit unexpected behavior fast. The mental model is different, and the failure modes are different.

ScaledJob vs ScaledObject — the key mental model difference

The distinction matters enough to be explicit about it before looking at any YAML.

ScaledObject targets a Deployment, StatefulSet, or similar workload. It adjusts the replica count — scaling up means more long-running pods that stay alive and poll for work. The workers are responsible for fetching work items from the queue themselves. KEDA only controls how many workers exist.

ScaledJob targets nothing that exists yet. KEDA creates a brand-new Job object (and therefore a brand-new pod) for each unit of work it detects in the queue. The Job runs, does its work, and exits. KEDA then cleans up the completed Job object (subject to history limits). There's no long-running worker pool — the job is the worker, and it lives only for the duration of one work item.

That distinction drives every architectural decision:

Use ScaledJob when:

  • Jobs must not share state. Each SQS message describes an independent unit of work (resize this image, transcode this video, run this ML inference).
  • Each work item has a hard completion requirement. You need to verify per-item success or failure in Kubernetes Job status, not just a worker's internal log.
  • You need per-job auditing. Job completion timestamps, pod logs, and exit codes are all scoped to one work item.
  • Work items have heterogeneous resource needs. You can template different resource profiles into different ScaledJobs targeting different queues.

Use ScaledObject when:

  • Workers are stateless and already built to poll a queue in a loop.
  • Polling overhead is acceptable and you prefer warm capacity over cold-start latency.
  • You want autoscaling but don't want the overhead of creating and destroying a Job object for every work item.

A concrete example: a video transcoding pipeline where each job takes 3–8 minutes is a perfect ScaledJob use case. A real-time API backend that pulls tasks from Redis and responds in under 100ms is a ScaledObject use case.

ScaledJob spec walkthrough

Here's a complete ScaledJob for an image processing workload driven by SQS:

yaml
1apiVersion: keda.sh/v1alpha1
2kind: ScaledJob
3metadata:
4  name: image-processor
5  namespace: processing
6spec:
7  jobTargetRef:
8    parallelism: 1
9    completions: 1
10    activeDeadlineSeconds: 300
11    backoffLimit: 2
12    template:
13      spec:
14        restartPolicy: Never
15        containers:
16          - name: processor
17            image: myregistry/image-processor:v2.1.0
18            resources:
19              requests:
20                cpu: 500m
21                memory: 512Mi
22              limits:
23                cpu: 2
24                memory: 2Gi
25            env:
26              - name: SQS_QUEUE_URL
27                valueFrom:
28                  secretKeyRef:
29                    name: sqs-config
30                    key: queue-url
31
32  pollingInterval: 15
33  maxReplicaCount: 50
34  successfulJobsHistoryLimit: 3
35  failedJobsHistoryLimit: 5
36
37  scalingStrategy:
38    strategy: "accurate"
39    pendingPodConditions:
40      - "Ready"
41
42  triggers:
43    - type: aws-sqs-queue
44      authenticationRef:
45        name: keda-sqs-auth
46      metadata:
47        queueURL: https://sqs.us-east-1.amazonaws.com/123456789/image-processing
48        queueLength: "1"
49        awsRegion: us-east-1
50        identityOwner: operator

jobTargetRef is a JobSpec — everything inside maps directly to a Kubernetes Job spec. KEDA stamps a new Job from this template for each unit of work it decides to create.

parallelism: 1 / completions: 1 means each job runs exactly one pod that must succeed once. For most queue-triggered batch work, this is what you want. Parallel jobs (MapReduce style) are covered in the FAQ section.

activeDeadlineSeconds: 300 is a hard five-minute wall clock timeout per job. A job that runs longer than this is terminated by the Kubernetes Job controller, not KEDA. Without this, a stuck job holds a slot indefinitely and blocks maxReplicaCount capacity.

backoffLimit: 2 allows the job to retry twice (three total attempts) before marking itself failed. Whether you want this depends on your SQS configuration — more on that in the lifecycle section.

restartPolicy: Never is required for Kubernetes Jobs. OnFailure restarts the same container in the same pod; Never creates a new pod for each retry, which is usually what you want for queue-driven work so each attempt starts clean.

pollingInterval: 15 is how often KEDA polls the queue for its current depth, in seconds. At 15 seconds you get reasonably fast response to queue growth without hammering the SQS API. The default is 30 seconds.

maxReplicaCount: 50 is your circuit breaker. Without it, a queue that suddenly contains 10,000 messages would cause KEDA to try creating 10,000 Jobs simultaneously. Set this to a value your cluster can actually absorb.

successfulJobsHistoryLimit: 3 and failedJobsHistoryLimit: 5: KEDA retains this many completed Job objects for inspection. In high-throughput scenarios, keep these low — completed Job objects accumulate in etcd and add API server load. Three successful and five failed is a reasonable default for most workloads.

queueLength: "1" in the trigger metadata means "one job per one message." This is the strictest setting. If your jobs can efficiently process multiple messages at once, increase this value — queueLength: "5" means KEDA creates one job for every five messages in the queue.

Scaling strategies

KEDA 2.x offers three scaling strategies for ScaledJob, controlled by scalingStrategy.strategy. Picking the wrong one is a common source of duplicate job creation.

default — KEDA calculates the number of jobs to create as ceil(queueLength / targetQueueLength). Simple, fast, but it doesn't account for jobs that are already pending (scheduled but not yet running). If your cluster is slow to schedule pods, KEDA may see the queue depth unchanged on the next poll interval and create more jobs for messages that are already being processed.

custom — You tune the calculation with two additional fields:

yaml
scalingStrategy:
  strategy: custom
  customScalingQueueLengthDeduction: 0
  customScalingRunningJobPercentage: "1.0"

customScalingRunningJobPercentage: "1.0" tells KEDA to count 100% of currently running jobs against the queue depth before deciding how many new jobs to create. "0.5" would count running jobs at 50%, meaning KEDA would still create some additional jobs even while current jobs are running — useful for pipeline warmup.

accurate — The most precise option and the one I use for strict "one job per message" semantics. KEDA counts actual pending and running jobs, deducts them from the queue depth, and only creates jobs for uncovered messages. Use this when you need guaranteed one-to-one mapping between messages and jobs without duplicates.

pendingPodConditions — This field prevents a specific race condition: KEDA schedules a job, but the pod hasn't started yet, so on the next poll interval KEDA sees the queue depth unchanged and creates another job. By specifying - "Ready", KEDA waits until pods have reached the Ready condition before counting them as covering queue messages. For the accurate strategy, this is essential.

SQS trigger with IRSA

On EKS, the cleanest authentication approach is IRSA (IAM Roles for Service Accounts). You don't distribute static AWS credentials, and the KEDA operator pod uses its own role to read queue metrics.

First, annotate the KEDA operator's service account with the IAM role ARN:

bash
kubectl annotate serviceaccount keda-operator \
  -n keda \
  eks.amazonaws.com/role-arn=arn:aws:iam::123456789:role/keda-sqs-reader

Then define a TriggerAuthentication that tells KEDA to use the pod identity provider:

yaml
1apiVersion: keda.sh/v1alpha1
2kind: TriggerAuthentication
3metadata:
4  name: keda-sqs-auth
5  namespace: processing
6spec:
7  podIdentity:
8    provider: aws-eks

The IAM role attached to the KEDA operator service account needs these minimum permissions:

json
1{
2  "Version": "2012-10-17",
3  "Statement": [
4    {
5      "Effect": "Allow",
6      "Action": [
7        "sqs:GetQueueAttributes",
8        "sqs:GetQueueUrl"
9      ],
10      "Resource": "arn:aws:sqs:us-east-1:123456789:image-processing"
11    }
12  ]
13}

GetQueueAttributes is what KEDA calls to read ApproximateNumberOfMessages. GetQueueUrl is needed if you reference the queue by name rather than full URL.

Note the identityOwner: operator field in the trigger's metadata. This controls whose identity reads the queue metrics:

  • identityOwner: operator — KEDA itself reads the queue using the KEDA operator's IRSA role. Your Job pods can use a separate, less-privileged IRSA role for the actual sqs:ReceiveMessage and sqs:DeleteMessage operations.
  • identityOwner: workload — The KEDA scaler reads the queue using the identity attached to the Job pods. Useful when your platform policy requires all AWS access to be scoped to the workload identity.

For most setups, operator is cleaner — the KEDA role only needs read access to queue attributes, and the Job pods get a separate role with the permissions they actually need to process messages.

RabbitMQ trigger

If you're running RabbitMQ instead of SQS, the trigger configuration looks like this:

yaml
1triggers:
2  - type: rabbitmq
3    metadata:
4      protocol: amqp
5      host: amqp://rabbitmq.messaging.svc.cluster.local:5672
6      queueName: image-jobs
7      mode: QueueLength
8      value: "1"
9    authenticationRef:
10      name: rabbitmq-auth

mode: QueueLength scales based on the number of messages waiting in the queue (as opposed to mode: MessageRate, which scales on publish rate). For batch ScaledJob workloads you almost always want QueueLength.

The TriggerAuthentication for RabbitMQ typically holds connection credentials:

yaml
1apiVersion: keda.sh/v1alpha1
2kind: TriggerAuthentication
3metadata:
4  name: rabbitmq-auth
5  namespace: processing
6spec:
7  secretTargetRef:
8    - parameter: host
9      name: rabbitmq-credentials
10      key: connection-string

The conceptual behavior is identical to the SQS case — KEDA polls the queue depth and creates Jobs accordingly. The differences are in authentication and the visibility/acknowledgment semantics of each broker.

Job lifecycle and cleanup

Understanding how KEDA and Kubernetes manage Job objects prevents the most common operational headaches.

History limitssuccessfulJobsHistoryLimit and failedJobsHistoryLimit control how many completed Job objects KEDA keeps around after they finish. If you process 10,000 messages per hour and set successfulJobsHistoryLimit: 100, KEDA deletes old Job objects once the count exceeds 100. Keep these low — every Job object is a resource in etcd, and at high throughput you can accumulate thousands of stale objects that increase API server memory usage and slow down kubectl get jobs calls.

activeDeadlineSeconds is your single most important safety valve. A job that exceeds this wall clock time is killed by the Kubernetes Job controller with reason DeadlineExceeded. The Job is marked failed. Without it, a job that hangs waiting on an external dependency (database, downstream API, network partition) holds a slot indefinitely and counts against your maxReplicaCount. Set this to 2–3x your p99 job runtime. For jobs that routinely take 60 seconds, activeDeadlineSeconds: 300 is a reasonable default.

backoffLimit: 0 vs backoffLimit: 2 — This interacts with your message broker semantics.

With SQS: when a job fails, the message's visibility timeout expires and the message reappears in the queue. KEDA will then create a new job for it on the next poll. If you set backoffLimit: 2, the Kubernetes Job controller retries the pod up to 2 more times before the visibility timeout expires and the message reappears. This means the same message could be processed up to 3 times by the job's retries and then picked up again as a new KEDA job. If your processing is idempotent, this is fine. If it's not, use backoffLimit: 0 and let SQS redelivery handle retry logic.

With RabbitMQ using manual ack: a job that fails without acking the message causes RabbitMQ to redeliver. backoffLimit: 0 is usually the right choice here too.

Handling poison messages

A poison message is one that will always cause a job failure regardless of how many times it's retried. Left unhandled, it creates an infinite loop: message appears in queue → KEDA creates a job → job fails → visibility timeout expires → message reappears → KEDA creates another job.

SQS redrive policy is the first line of defense. Set maxReceiveCount on the source queue's redrive policy to move a message to a dead-letter queue (DLQ) after N receive attempts:

json
{
  "deadLetterTargetArn": "arn:aws:sqs:us-east-1:123456789:image-processing-dlq",
  "maxReceiveCount": 3
}

After 3 failed deliveries, SQS moves the message to the DLQ where it can't trigger new jobs. KEDA won't see it anymore.

In-process defense — Your processor should explicitly delete the message from SQS even on partial failures when appropriate. If your business logic determines a message is invalid (missing required fields, unrecognizable format), delete it rather than letting it timeout back to the queue. Don't make the retry machinery do work that the application already knows is futile.

DLQ monitoring — Alert on a non-zero ApproximateNumberOfMessagesVisible in the DLQ. A message in the DLQ means something broke in a way your retry policy couldn't recover from. That's always worth a human looking at.

For RabbitMQ, the equivalent is a dead-letter exchange (DLX) configured on the source queue. Messages that are nacked or expired beyond their TTL are routed to the DLX.

Concurrency limits and burst control

maxReplicaCount is non-negotiable. I've seen this omitted in staging environments where queue depth never gets large, deployed to production where a partner sends 50,000 messages in one batch, and watched the cluster scheduling API buckle under 50,000 simultaneous job creation requests.

For CPU-intensive jobs like ML inference or video transcoding, also think about whether your node pool can actually provide the resources your Jobs request. If each job requests 2 CPU cores and maxReplicaCount is 50, you need 100 CPU cores available, plus headroom for system pods and other workloads. Either ensure your cluster autoscaler can provision that capacity, or set maxReplicaCount to match what's actually available.

For bursty queues where you want a controlled ramp rather than an immediate spike to max, strategy: custom with a conservative customScalingRunningJobPercentage is useful:

yaml
scalingStrategy:
  strategy: custom
  customScalingQueueLengthDeduction: 0
  customScalingRunningJobPercentage: "1.0"

Setting customScalingRunningJobPercentage: "1.0" means KEDA fully credits running jobs against the queue depth, so it won't over-create. Dropping it to "0.5" would cause KEDA to treat running jobs as only covering half the queue depth, creating additional jobs for the assumed "uncovered" remainder — useful when you expect jobs to fail at a high rate and want proactive replacement, but dangerous if your failure rate assumption is wrong.

Monitoring ScaledJob

A few commands I use regularly to check ScaledJob health:

bash
kubectl get jobs -n processing --sort-by=.metadata.creationTimestamp

This shows active and recently completed jobs, sorted by creation time so you can see whether new jobs are being created as expected.

bash
kubectl describe scaledjob image-processor -n processing

The Events section here is your first stop when KEDA isn't creating jobs you expect. It shows the last few scaling decisions, including the queue depth KEDA saw and how many jobs it decided to create.

bash
kubectl logs -n keda -l app=keda-operator --tail=100

KEDA operator logs show the actual scaler calculations at each poll interval. When a scaler fails to authenticate or read the queue, you'll see the error here first.

KEDA exposes Prometheus metrics at port 8080 (/metrics) on the operator pod. The two most useful for ScaledJob:

  • keda_scaler_active{namespace="processing",scaledObject="image-processor"} — 1 when KEDA considers the scaler active (queue depth > 0), 0 when idle
  • keda_scaler_metrics_value{namespace="processing",scaledObject="image-processor"} — the raw metric value KEDA read from the scaler (queue depth in the SQS case)

Wire these into your Grafana dashboard alongside kube_job_status_active and kube_job_status_failed from kube-state-metrics for a complete picture of queue depth versus job throughput.

Production pitfalls

These are failure modes I've either hit directly or debugged for clients:

Not setting activeDeadlineSeconds — A job that calls an external API and that API goes down will hang indefinitely. That job holds one slot against maxReplicaCount. If the external API is down for 20 minutes and maxReplicaCount is 20, after 20 minutes you have 20 stuck jobs and zero capacity to process new messages, even after the API recovers. Always set activeDeadlineSeconds.

backoffLimit > 0 with SQS without idempotent processing — If your job fails mid-way through processing (after performing some side effect but before deleting the message) and then retries, the retry sees the same message and performs the side effect again. With backoffLimit: 2, the same message can be processed up to three times by the job's own retry mechanism. Either make your processing idempotent (safe to run twice), use backoffLimit: 0, or extend the SQS visibility timeout beyond your activeDeadlineSeconds to prevent redelivery while the job is still alive.

Omitting maxReplicaCount — The default is no limit. A sudden queue spike creates as many jobs as there are messages. 10,000 messages means 10,000 simultaneous job creation API calls, which can saturate the API server and degrade the entire cluster.

strategy: default with queueLength: "1" on a slow cluster — The default strategy doesn't deduct pending (scheduled but not started) jobs from the queue depth. On a cluster that takes 30 seconds to schedule a pod, KEDA polls at 15-second intervals and sees the queue depth unchanged. It creates more jobs for messages already being processed. Use accurate for strict one-to-one semantics.

High successfulJobsHistoryLimit at high throughput — Setting successfulJobsHistoryLimit: 100 sounds conservative. At 1,000 messages per minute, you accumulate 100 successful Job objects every 6 seconds. KEDA is constantly creating and cleaning up Job objects, and etcd is under continuous churn. Set this to 3–5 for high-throughput workloads.

Frequently asked questions

Can ScaledJob work with parallelism > 1?

Yes. Set parallelism and completions in jobTargetRef to values greater than 1. For example, parallelism: 4 and completions: 4 creates a job with four pods that all need to complete successfully. This is useful for MapReduce-style work where multiple pods collaborate on a single work item — one pod fetches and splits data, three others process partitions. Each KEDA-created Job is still triggered by one unit of queue depth, but the job internally runs multiple pods. Keep in mind that maxReplicaCount limits the number of Jobs, not the number of pods — with parallelism: 4 and maxReplicaCount: 10, you can have up to 40 pods running simultaneously.

Does ScaledJob support scale-to-zero?

Yes, and it's the default behavior. When the queue depth reaches zero, KEDA creates no new jobs. Any running jobs finish and exit normally. There are no long-running pods consuming resources when the queue is empty — this is the primary cost advantage of ScaledJob over ScaledObject for sporadic workloads.

Can I use ScaledJob with a database query as trigger?

Yes. The postgresql and mysql scalers both work with ScaledJob. You point them at a SQL query that returns a row count — typically something like SELECT COUNT(*) FROM processing_queue WHERE status = 'pending'. KEDA uses the returned count as the queue depth metric and creates jobs accordingly. The same scaling strategy options apply. One caveat: unlike SQS where the broker manages message visibility and redelivery, with a database queue you're responsible for implementing your own visibility locking and poison message handling in application code.

How does KEDA handle operator restarts?

ScaledJob objects are persisted in etcd as custom resources. When the KEDA operator restarts (rolling update, pod eviction, node drain), it reads all existing ScaledJob resources from the API server and resumes polling on startup. There's no gap in the record of what should be scaled — but there will be a gap in polling equal to the time KEDA is restarting (typically 10–30 seconds for a rolling restart). Jobs already running continue to run normally, since they're just Kubernetes Jobs managed by the Job controller, which is independent of KEDA. The brief polling gap means a few messages might wait slightly longer to be picked up, but no messages are lost.

Further reading

For ScaledObject — KEDA's long-running worker autoscaling mode — along with built-in scalers, the KEDA architecture, and how the HPA integration works under the hood, see KEDA Event-Driven Autoscaling on Kubernetes. If you're evaluating whether to add KEDA at all, that post covers the full install and configuration story.

For Kubernetes-native batch without KEDA — Job parallelism modes, CronJob concurrency policy, ttlSecondsAfterFinished, and the edge cases in Job completion tracking — see Kubernetes Jobs and CronJobs in Production.

For the AWS messaging layer that typically feeds ScaledJob triggers — SQS queue configuration, dead-letter queue redrive policies, SNS fan-out, and EventBridge routing — see AWS SQS, SNS, and EventBridge Architecture.

For getting resource requests right on Job pods — why requests matter more than limits for batch workloads, how VPA interacts with Jobs, and QoS class implications — see Kubernetes Resource Management: Requests, Limits, and QoS.


Processing events at scale and unsure whether ScaledJob or a long-running worker pool fits better? Talk to us at Coding Protocols — we help platform teams design event-driven batch architectures that match their throughput and latency requirements.

Related Topics

Kubernetes
KEDA
Autoscaling
Batch
Jobs
SQS
Event-Driven

Read Next