ArgoCD ApplicationSet Progressive Syncs: Controlled Multi-Cluster Rollouts
ApplicationSet progressive syncs let you roll out changes across multiple clusters in waves — sync 2 clusters, wait for health checks to pass, sync the next 4, then the remaining 20. Without this, ApplicationSet syncs all clusters simultaneously and a bad deployment takes down every environment at once.

I've seen it happen more than once: a platform team manages 30 production clusters with ApplicationSet, someone merges a config change, ArgoCD syncs all 30 clusters simultaneously, and within 5 minutes every environment is degraded. A bad Helm values override, a missing secret reference, an incorrect resource limit — it doesn't matter what the specific failure is. The outcome is the same: all clusters, all at once, all broken.
ApplicationSet is the right tool for fleet-scale GitOps. It generates one ArgoCD Application per cluster (or per tenant, or per environment) from a single template, and keeps them all reconciled to Git. The problem is that by default, when you push a change, the ApplicationSet controller immediately signals all generated Applications to sync. There's no concept of "sync the canary first, verify it's healthy, then sync the rest." Every Application races to pull the new state at the same time.
Progressive syncs fix this. They let you define rollout steps — each step targets a subset of your Applications (clusters), syncs them up to maxUpdate at a time, and then waits for all Applications in that step to reach Healthy status before advancing to the next step. If step one's Applications never become Healthy, the rollout halts. You investigate, revert the Git commit if needed, and nothing beyond step one is affected.
This feature shipped as experimental in ArgoCD 2.6 and was promoted to stable in ArgoCD 2.9. It requires a feature gate. Here's everything you need to configure it in production.
Enabling the Feature Gate
Progressive syncs are not enabled by default — even in ArgoCD 2.9. You must pass --enable-progressive-syncs to the argocd-applicationset-controller Deployment.
If you're managing ArgoCD with Helm (the argo-cd chart), add this to your values file:
applicationSet:
extraArgs:
- --enable-progressive-syncsIf you're patching the Deployment directly (or working in a cluster where Helm is not used for ArgoCD itself):
kubectl patch deployment argocd-applicationset-controller \
-n argocd \
--type='json' \
-p='[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--enable-progressive-syncs"}]'The critical thing to understand: without this flag, the strategy field in your ApplicationSet spec is silently ignored. ArgoCD does not warn you, does not emit an event, does not log at WARN level. Your ApplicationSet with a carefully crafted rollingSync block will simply behave as if strategy were absent — all Applications sync simultaneously. I've seen this trip up multiple teams who were confident they had progressive syncs configured, only to discover the flag was missing when a rollout went sideways.
Verify the flag is active after patching:
kubectl get pods -n argocd -l app.kubernetes.io/name=argocd-applicationset-controller \
-o jsonpath='{.items[0].spec.containers[0].args}' | tr ',' '\n'You should see --enable-progressive-syncs in the output.
RollingSync Strategy
As of ArgoCD 2.9, RollingSync is the only available strategy type for progressive syncs. The structure is straightforward but the label matching behavior warrants careful attention.
Here's a complete ApplicationSet for a payment service deployed across a production fleet:
1apiVersion: argoproj.io/v1alpha1
2kind: ApplicationSet
3metadata:
4 name: payment-service
5 namespace: argocd
6spec:
7 generators:
8 - clusters:
9 selector:
10 matchLabels:
11 env: production
12 strategy:
13 type: RollingSync
14 rollingSync:
15 steps:
16 - matchExpressions:
17 - key: region
18 operator: In
19 values:
20 - us-east-1-canary
21 maxUpdate: 1
22 - matchExpressions:
23 - key: region
24 operator: In
25 values:
26 - us-east-1
27 - us-west-2
28 maxUpdate: 2
29 - matchExpressions:
30 - key: region
31 operator: NotIn
32 values:
33 - us-east-1-canary
34 - us-east-1
35 - us-west-2
36 maxUpdate: 10
37 template:
38 metadata:
39 name: '{{name}}-payment-service'
40 spec:
41 project: production
42 source:
43 repoURL: https://github.com/myorg/gitops-fleet
44 targetRevision: HEAD
45 path: apps/payment-service
46 destination:
47 server: '{{server}}'
48 namespace: payments
49 syncPolicy:
50 automated:
51 prune: true
52 selfHeal: trueWalking through the key fields:
strategy.type: RollingSync — The only valid value as of 2.9. The field is present for forward compatibility with other strategy types that may be introduced later.
rollingSync.steps — An ordered list of steps. The controller processes them sequentially. Step 2 does not start until all Applications in Step 1 are Healthy.
matchExpressions — Label selectors that identify which Applications belong to this step. These expressions match against labels on the generated Applications — which come directly from the generator outputs (more on this below). The syntax mirrors Kubernetes matchExpressions in pod affinity rules: In, NotIn, Exists, DoesNotExist operators are all valid.
maxUpdate — How many Applications within this step can sync simultaneously. In Step 1 above, maxUpdate: 1 means exactly one canary cluster syncs at a time. In Step 3, maxUpdate: 10 means up to 10 of the remaining clusters sync concurrently — useful for large fleets where you want parallelism but still want a controlled blast radius.
The rollout logic: when a new Git revision is detected (or ApplicationSet template changes), the controller identifies which Applications are OutOfSync. It starts with Step 1's matching Applications, syncing up to maxUpdate of them. It waits for all of them to become Healthy. Then it moves to Step 2, and so on.
Cluster Labeling for Progressive Sync Steps
The matchExpressions in rollingSync.steps match against Application labels. For a clusters generator, these labels come from the ArgoCD cluster secrets in the argocd namespace. The generator propagates secret labels to the generated Application's metadata.labels.
Label your cluster secrets to define wave membership:
1kubectl label secret prod-cluster-us-east-1-canary \
2 -n argocd \
3 region=us-east-1-canary \
4 wave=canary
5
6kubectl label secret prod-cluster-us-east-1 \
7 -n argocd \
8 region=us-east-1 \
9 wave=primary
10
11kubectl label secret prod-cluster-us-west-2 \
12 -n argocd \
13 region=us-west-2 \
14 wave=primary
15
16kubectl label secret prod-cluster-eu-west-1 \
17 -n argocd \
18 region=eu-west-1 \
19 wave=secondary
20
21kubectl label secret prod-cluster-ap-southeast-1 \
22 -n argocd \
23 region=ap-southeast-1 \
24 wave=secondaryVerify the labels are present on generated Applications after the ApplicationSet reconciles:
kubectl get applications -n argocd \
-l applicationset-name=payment-service \
--show-labelsYou should see the region and wave labels on each generated Application, propagated from the cluster secret. If labels are missing, the ApplicationSet controller may not have reconciled since you added them — trigger a reconcile by making a trivial annotation change on the ApplicationSet.
One important note: the clusters generator only propagates labels from secrets that carry the argocd.argoproj.io/secret-type: cluster label. Verify this label is present on all your cluster secrets:
kubectl get secrets -n argocd \
-l argocd.argoproj.io/secret-type=cluster \
--show-labelsHealth Check Gates Between Steps
The ApplicationSet controller will not advance to the next step until every Application in the current step reports Healthy. This is the core of the blast radius protection.
What Healthy means in practice: ArgoCD evaluates the health of every resource owned by the Application — Deployments, StatefulSets, DaemonSets, Services, Ingresses, HPAs, PersistentVolumeClaims, and more. The built-in health assessments are defined per resource kind. For a Deployment, Healthy means all desired replicas are available and no pods are in a crash loop. For an Ingress, Healthy means at least one IP/hostname is assigned. For an HPA, Healthy means the current replica count is within min/max bounds.
Failure scenarios that block step advancement:
- A rollout to the canary cluster causes pods to OOMKill — the Deployment becomes
Degraded - A new Ingress resource has a typo in the hostname — the Ingress never gets an IP, stays
Progressing - A CRD-backed resource (like a Rollout object from Argo Rollouts) reports a failed AnalysisRun
The controller polls Application health every 10 seconds by default. You can tune this with the --applicationset-controller-requeue-after flag on the argocd-applicationset-controller Deployment. For large fleets, be conservative — polling 500 Applications every 10 seconds is non-trivial work.
If a step's health check never passes (the canary cluster never recovers), the rollout stays stuck at that step indefinitely. The controller does not time out and does not automatically roll back. You must intervene manually: either fix the root cause and push a new commit, or revert the breaking commit and let ArgoCD resync the failed cluster back to the previous state.
Custom health checks are defined in the argocd-cm ConfigMap under resource.customizations.health.<group/Kind> as Lua scripts. If you have custom CRDs (like internal platform resources), you can write Lua health assessments that the ApplicationSet controller will use as gate conditions:
1apiVersion: v1
2kind: ConfigMap
3metadata:
4 name: argocd-cm
5 namespace: argocd
6data:
7 resource.customizations.health.internal.myplatform.io_ServiceDeployment: |
8 hs = {}
9 if obj.status ~= nil then
10 if obj.status.phase == "Ready" then
11 hs.status = "Healthy"
12 hs.message = "Service deployment is ready"
13 return hs
14 end
15 if obj.status.phase == "Failed" then
16 hs.status = "Degraded"
17 hs.message = obj.status.message
18 return hs
19 end
20 end
21 hs.status = "Progressing"
22 hs.message = "Waiting for service deployment"
23 return hsmaxUpdate: Percentage vs Absolute Count
maxUpdate accepts either an integer (absolute count) or a quoted percentage string:
maxUpdate: "30%"
maxUpdate: 5Use percentage when your fleet size changes frequently — if you have 100 clusters in the secondary wave today and 120 next quarter, "30%" scales automatically without touching the ApplicationSet spec. Use absolute count when you need determinism — maxUpdate: 1 guarantees exactly one canary cluster syncs at a time, regardless of how many clusters match that step's selector.
Percentage is calculated against the number of Applications matching the step's matchExpressions, not the total fleet. If step 3 matches 80 clusters and maxUpdate is "25%", the controller syncs up to 20 clusters simultaneously, waits for all 20 to become Healthy, then picks the next 20, and so on — until all 80 are done and Healthy.
Fractional results are rounded down. "33%" of 10 Applications is 3, not 3.3.
Composing with Argo Rollouts for Two-Level Blast Radius Control
ApplicationSet progressive syncs and Argo Rollouts operate at different levels and compose naturally.
- ApplicationSet RollingSync controls blast radius at the cluster level: "Sync US clusters before EU clusters."
- Argo Rollouts Canary controls blast radius within a single cluster at the pod level: "Send 10% of traffic to the new version, analyze metrics, then increase to 50%, then 100%."
You get two independent blast radius gates. A change must pass the Argo Rollouts canary within cluster N before the ApplicationSet advances to cluster N+1.
The integration is implicit — no additional wiring required. When an Argo Rollouts Rollout resource fails its AnalysisRun, it marks itself as Degraded. ArgoCD sees the Degraded resource, marks the Application as Degraded. The ApplicationSet controller sees a Degraded Application in the current step, halts step advancement.
A minimal Rollout with AnalysisTemplate integration that will block the ApplicationSet from advancing if the error rate exceeds threshold:
1apiVersion: argoproj.io/v1alpha1
2kind: Rollout
3metadata:
4 name: payment-service
5 namespace: payments
6spec:
7 replicas: 10
8 strategy:
9 canary:
10 steps:
11 - setWeight: 10
12 - analysis:
13 templates:
14 - templateName: error-rate-check
15 args:
16 - name: service-name
17 value: payment-service
18 - setWeight: 50
19 - pause:
20 duration: 5m
21 - setWeight: 100
22 canaryMetadata:
23 labels:
24 deployment: canary
25 stableMetadata:
26 labels:
27 deployment: stable
28---
29apiVersion: argoproj.io/v1alpha1
30kind: AnalysisTemplate
31metadata:
32 name: error-rate-check
33 namespace: payments
34spec:
35 args:
36 - name: service-name
37 metrics:
38 - name: error-rate
39 interval: 60s
40 failureLimit: 3
41 provider:
42 prometheus:
43 address: http://prometheus.monitoring.svc.cluster.local:9090
44 query: |
45 sum(rate(http_requests_total{service="{{args.service-name}}",status=~"5.."}[2m]))
46 /
47 sum(rate(http_requests_total{service="{{args.service-name}}"}[2m]))
48 successCondition: result[0] < 0.05If the error rate exceeds 5% for more than 3 consecutive intervals, the AnalysisRun fails, the Rollout is marked Degraded, the Application is marked Degraded, and the ApplicationSet stops. No manual intervention required to block the next wave — the health model handles it.
Pausing and Resuming Progressive Rollouts
Sometimes you need to halt a rollout mid-flight: a change freeze kicks in, an on-call engineer wants to investigate a canary anomaly before proceeding, or a downstream team requests a hold.
Pause the ApplicationSet with an annotation:
kubectl annotate applicationset payment-service \
-n argocd \
"argocd.argoproj.io/pause-rollout=true"The controller stops advancing to new steps immediately. Applications already syncing in the current step are not interrupted — they continue to their completion. But no new Applications are picked up for syncing.
Resume by removing the annotation:
kubectl annotate applicationset payment-service \
-n argocd \
"argocd.argoproj.io/pause-rollout-"The trailing - in the annotation key is the kubectl annotate syntax for removal. After the annotation is removed, the controller resumes from where it left off.
Pausing does not roll back already-synced clusters. If step 1 and step 2 completed before you paused, those clusters stay on the new version. Pause is a forward-progress hold, not a rollback mechanism.
Observing Progressive Sync Progress
The ApplicationSet controller exposes rollout state in the ApplicationSet's .status field and propagates it through ArgoCD's standard Application health model.
Check per-Application status for your fleet:
kubectl get applications -n argocd \
-l applicationset-name=payment-service \
-o custom-columns='NAME:.metadata.name,HEALTH:.status.health.status,SYNC:.status.sync.status'Sample output during a rollout:
NAME HEALTH SYNC
prod-cluster-canary-payment-service Healthy Synced
prod-cluster-us-east-1-payment-service Progressing OutOfSync
prod-cluster-us-west-2-payment-service Unknown OutOfSync
prod-cluster-eu-west-1-payment-service Unknown OutOfSync
The ApplicationSet's own status field tracks the current step:
kubectl get applicationset payment-service -n argocd -o yaml | grep -A 20 "^status:"You'll see output like:
1status:
2 conditions:
3 - lastTransitionTime: "2026-05-10T14:32:00Z"
4 message: "ApplicationSet has successfully synced"
5 reason: ApplicationSetUpToDate
6 status: "True"
7 type: ResourcesUpToDate
8 applicationStatus:
9 - application: prod-cluster-canary-payment-service
10 lastTransitionTime: "2026-05-10T14:30:00Z"
11 message: "Application resource is Healthy"
12 status: Healthy
13 step: "1"
14 - application: prod-cluster-us-east-1-payment-service
15 lastTransitionTime: "2026-05-10T14:31:00Z"
16 message: "Application syncing"
17 status: Progressing
18 step: "2"In the ArgoCD UI (2.9+), the ApplicationSet detail view shows a step progress indicator — each step is displayed with a count of synced/healthy Applications out of total. This is useful for operations teams who don't want to use kubectl during a live rollout.
Rollback: What to Do When a Step Fails
Progressive syncs do not auto-rollback. When a step fails (an Application stays Degraded or Progressing indefinitely), the rollout halts at that step. You have three options:
Option 1: Fix the root cause in Git and push.
If the issue was a misconfigured value, fix it, push the corrected commit. ArgoCD detects the new revision, the ApplicationSet treats this as a new change that needs to propagate. The controller re-evaluates which Applications are OutOfSync and restarts the progressive rollout from Step 1. The failed cluster gets the fixed version, recovers to Healthy, and the rollout proceeds.
Option 2: Revert the Git commit. If the root cause isn't obvious, or if you need to restore service quickly, revert:
git revert HEAD --no-edit
git push origin mainArgoCD detects the revert as a new HEAD, treats the cluster as OutOfSync (now behind the previous HEAD), and syncs it back to the last known-good state. Once the cluster returns to Healthy, the ApplicationSet considers the rollout complete — there's nothing left to advance through because all Applications are now synced to HEAD (which is the revert commit, equivalent to the pre-change state).
Clusters that already received the bad update before the rollout halted also get the revert automatically — GitOps propagates the revert to all synced clusters, not just the failed one.
Option 3: Manual Application sync bypass.
In emergencies, you can bypass the progressive rollout by manually syncing an Application via argocd app sync or the UI. This lets the Application proceed regardless of the ApplicationSet's rollout state. Use this sparingly — it breaks the wave ordering guarantees and can leave the ApplicationSet's state model inconsistent until the next reconciliation cycle.
Progressive Syncs with Matrix and List Generators
The clusters generator is the most common use case, but RollingSync works with any generator type. The matchExpressions in step selectors match against whatever labels the generator outputs.
A matrix generator combining cluster selection with a list of services:
1apiVersion: argoproj.io/v1alpha1
2kind: ApplicationSet
3metadata:
4 name: platform-services
5 namespace: argocd
6spec:
7 generators:
8 - matrix:
9 generators:
10 - clusters:
11 selector:
12 matchLabels:
13 env: production
14 - list:
15 elements:
16 - service: payments
17 port: "8080"
18 - service: orders
19 port: "8081"
20 - service: inventory
21 port: "8082"
22 strategy:
23 type: RollingSync
24 rollingSync:
25 steps:
26 - matchExpressions:
27 - key: wave
28 operator: In
29 values:
30 - canary
31 maxUpdate: 2
32 - matchExpressions:
33 - key: wave
34 operator: In
35 values:
36 - primary
37 maxUpdate: "25%"
38 - matchExpressions:
39 - key: wave
40 operator: NotIn
41 values:
42 - canary
43 - primary
44 maxUpdate: "50%"
45 template:
46 metadata:
47 name: '{{name}}-{{service}}'
48 labels:
49 service: '{{service}}'
50 spec:
51 project: production
52 source:
53 repoURL: https://github.com/myorg/gitops-fleet
54 targetRevision: HEAD
55 path: 'apps/{{service}}'
56 destination:
57 server: '{{server}}'
58 namespace: '{{service}}'
59 syncPolicy:
60 automated:
61 prune: true
62 selfHeal: trueWith the matrix generator, each generated Application has labels from both the cluster secret (wave, region, env) and the list element (service, port). The matchExpressions in rollingSync.steps can match on any of these. This means you can have step selectors like "all canary-wave clusters, for all services" — the step matches by cluster label, and maxUpdate limits concurrent syncs across the combined Applications.
One subtlety: with a matrix generator producing N clusters × M services = N×M Applications, your maxUpdate limit applies to the total count of Applications in that step, not the count of clusters. maxUpdate: 2 means 2 Applications simultaneously — which might be 2 services on the same canary cluster, or 1 service on each of 2 canary clusters.
Limitations and Known Issues (ArgoCD 2.9)
Progressive syncs require automated sync policy. If your Applications have syncPolicy.automated omitted or enabled: false, progressive syncs have no effect. Applications with manual sync only sync when a human triggers argocd app sync — the wave ordering is meaningless in that case. Every Application in the fleet must have syncPolicy.automated set for RollingSync to function.
Applications must match exactly one step. An Application that matches zero steps is treated as outside the rollout — it syncs immediately, bypassing wave ordering. An Application matching multiple steps causes undefined behavior (typically the first matching step wins, but this is undocumented and not guaranteed). Design your matchExpressions so each Application matches exactly one step, and use NotIn selectors in later steps to explicitly exclude earlier-wave clusters.
Controller restart resets in-progress rollout state. If the argocd-applicationset-controller pod restarts during a live rollout, it re-evaluates which Applications are OutOfSync from scratch. This may cause already-synced Applications to be re-evaluated, potentially re-triggering syncs. In practice this is rarely harmful (re-syncing a healthy Application to the same revision is a no-op), but it can cause noise in your rollout progress metrics.
Large fleet performance. With 500+ Applications, the controller's step health evaluation loop becomes non-trivial. Each poll cycle fetches Application status from the Kubernetes API for all Applications in the current step. Tune --applicationset-controller-requeue-after to a longer interval (e.g., --applicationset-controller-requeue-after=30s) if you see high API server load during large rollouts. The default is 10 seconds.
No native timeout for stuck steps. If a step's Applications are permanently stuck in Progressing (a webhook timeout, a stuck controller), the rollout waits indefinitely. There's no stepTimeout field — you're responsible for monitoring and manually intervening. Set up ArgoCD notifications to alert on Application health degradation so you catch stuck rollouts before they become incidents.
Frequently Asked Questions
Can I use progressive syncs without a dedicated canary cluster?
Yes. The canary concept is just a label. Step 1 can be any subset of your fleet — a pre-production cluster, a low-traffic regional cluster, a dedicated staging environment that mirrors production configuration. The key requirement is that the cluster is representative enough that its health gives you meaningful signal before you proceed to the next wave. A dev cluster that runs at 1/100th of production traffic is a weaker signal than a pre-prod cluster that mirrors production service topology.
Does progressive sync work with auto-sync disabled?
No. As covered above, progressive syncs require syncPolicy.automated. With auto-sync disabled, Applications only sync on manual trigger. The wave ordering the ApplicationSet controller imposes only works because it controls when automated syncs are initiated. If you trigger a manual sync via argocd app sync, it bypasses the wave ordering entirely.
How does progressive sync interact with ApplicationSet's syncPolicy at the ApplicationSet level?
The ApplicationSet-level syncPolicy (the one directly under spec.syncPolicy in the ApplicationSet resource, not inside spec.template) controls lifecycle behaviors: whether the ApplicationSet controller deletes generated Applications when they're removed from the generator output (preserveResourcesOnDeletion), and whether it creates Applications automatically. This is separate from the strategy.rollingSync configuration and does not affect sync timing. They coexist independently — strategy controls rollout sequencing; the ApplicationSet-level syncPolicy controls Application lifecycle management.
Can I set different sync timeouts per step?
Not natively in ApplicationSet. All generated Applications use the same Application-level sync timeout, configured at the ArgoCD project level or in the Application spec. If you need per-step timeout gates, implement them in Argo Rollouts AnalysisTemplate definitions — set consecutiveErrorLimit and analysis interval so that a long-running step eventually fails its analysis, marks the Application Degraded, and halts the rollout. This gives you effective timeout behavior at the cost of requiring Argo Rollouts in your stack.
Related Reading
For a deep dive into ApplicationSet generators — the clusters, matrix, git, and list generators, Application templating, and fleet-scale management patterns — see ArgoCD ApplicationSet Multi-Cluster. Progressive syncs only make sense once you have a working ApplicationSet fleet; that post covers the foundation.
For Argo Rollouts canary deployments within a single cluster that compose naturally with ApplicationSet progressive syncs, see Argo Rollouts Progressive Delivery. The two-level blast radius model (cluster-level via ApplicationSet, pod-level via Rollouts) is where the real power is.
For ArgoCD architecture, Application resource structure, sync policies, and how ArgoCD manages GitOps reconciliation, see ArgoCD GitOps Kubernetes. Progressive syncs sit on top of ArgoCD's core sync machinery.
For the production ArgoCD setup — HA controller, Redis Sentinel, notifications, RBAC, and repo server configuration — see GitOps ArgoCD Production Setup. The ApplicationSet controller is part of this stack; running it in HA mode matters for progressive sync reliability.
For Flux CD's approach to multi-cluster GitOps — Kustomization dependencies, dependsOn ordering, and FluxCD's progressive delivery alternative — see Flux CD GitOps Kubernetes. If you're evaluating GitOps tooling or running Flux alongside ArgoCD, the comparison is useful context.
Running ApplicationSet across many clusters and taking the risk of simultaneous rollouts? Talk to us at Coding Protocols — we help platform teams configure progressive sync strategies that match their risk tolerance and release velocity.


