Multi-Cluster Kubernetes: Fleet Management with Flux and Argo CD
A single EKS cluster has practical limits: regulatory requirements isolate prod from dev, regional latency requires clusters per region, and blast radius containment means you don't want a bad deployment to take down all environments. Managing 10+ clusters requires GitOps fleet management — a pattern where a control plane cluster applies consistent configuration across all workload clusters from a single Git source of truth.

Running a single cluster per environment (dev, staging, prod) becomes a fleet once you add regional clusters (us-east-1, eu-west-1, ap-southeast-1) or tenant clusters (one per business unit or compliance boundary). At 5+ clusters, manual configuration management — SSHing into each cluster to apply changes — becomes untenable. Fleet management is the operational model where a control plane (hub) pushes consistent configuration to all clusters (spokes) from a single Git repository.
Two patterns dominate: Flux with its hub-spoke bootstrap and Kustomization dependency model, and Argo CD with ApplicationSet generators that enumerate clusters from labels or cluster inventory. Both work. The choice is largely about whether you prefer Flux's pull-everywhere model or Argo CD's centralized visibility.
Fleet Topology
The hub-spoke model:
Control Plane Cluster (hub)
├── Runs: Argo CD / Flux management plane
├── Holds: Cluster secrets, fleet config repo access
└── Pushes to:
├── dev-us-east-1 (spoke)
├── staging-us-east-1 (spoke)
├── prod-us-east-1 (spoke)
├── prod-eu-west-1 (spoke)
└── prod-ap-southeast-1 (spoke)
Each spoke cluster runs only workloads — the GitOps agent on the hub manages configuration. This means spoke cluster breakdowns don't take down the control plane.
Fleet Management with Argo CD
Registering Clusters
1# Register spoke clusters with Argo CD (--label flag is not supported by argocd cluster add)
2argocd cluster add arn:aws:eks:us-east-1:123456789:cluster/prod-us-east-1 \
3 --name prod-us-east-1
4
5argocd cluster add arn:aws:eks:eu-west-1:123456789:cluster/prod-eu-west-1 \
6 --name prod-eu-west-1
7
8argocd cluster add arn:aws:eks:us-east-1:123456789:cluster/staging-us-east-1 \
9 --name staging-us-east-1
10
11# Set labels after registration with argocd cluster set
12argocd cluster set prod-us-east-1 --label region=us-east-1 --label env=prod --label tier=production
13argocd cluster set prod-eu-west-1 --label region=eu-west-1 --label env=prod --label tier=production
14argocd cluster set staging-us-east-1 --label region=us-east-1 --label env=staging --label tier=non-productionApplicationSet: Deploy to All Clusters by Label
ApplicationSet generates Application resources dynamically from a generator:
1# Deploy Prometheus stack to all production clusters
2apiVersion: argoproj.io/v1alpha1
3kind: ApplicationSet
4metadata:
5 name: prometheus-fleet
6 namespace: argocd
7spec:
8 generators:
9 - clusters:
10 selector:
11 matchLabels:
12 tier: production # Only production clusters
13
14 template:
15 metadata:
16 name: "prometheus-{{name}}" # {{name}} = cluster name
17 spec:
18 project: platform
19
20 source:
21 repoURL: https://github.com/my-org/fleet-config
22 targetRevision: main
23 path: "platform/prometheus/overlays/{{metadata.labels.env}}" # env-specific overlay
24
25 destination:
26 server: "{{server}}" # Cluster API endpoint (injected by generator)
27 namespace: monitoring
28
29 syncPolicy:
30 automated:
31 prune: true
32 selfHeal: true
33 syncOptions:
34 - CreateNamespace=true
35
36 ignoreDifferences:
37 - group: apps
38 kind: Deployment
39 jsonPointers:
40 - /spec/replicas # Don't revert HPA-managed replica countsProgressive Rollout Across Environments
Deploy to dev first, then staging, then prod with sync waves:
1apiVersion: argoproj.io/v1alpha1
2kind: ApplicationSet
3metadata:
4 name: payments-api-fleet
5 namespace: argocd
6spec:
7 generators:
8 - list:
9 elements:
10 - cluster: dev-us-east-1
11 env: dev
12 wave: "1"
13 - cluster: staging-us-east-1
14 env: staging
15 wave: "2"
16 - cluster: prod-us-east-1
17 env: prod
18 wave: "3"
19 - cluster: prod-eu-west-1
20 env: prod
21 wave: "4"
22
23 template:
24 metadata:
25 name: "payments-api-{{cluster}}"
26 annotations:
27 argocd.argoproj.io/sync-wave: "{{wave}}" # Wave ordering
28
29 spec:
30 project: payments-team
31
32 source:
33 repoURL: https://github.com/my-org/payments-api
34 targetRevision: main
35 path: "k8s/overlays/{{env}}"
36
37 destination:
38 name: "{{cluster}}" # Cluster by registered name
39 namespace: payments
40
41 syncPolicy:
42 automated:
43 prune: true
44 selfHeal: trueNote: Sync-wave annotations on Application templates within an ApplicationSet do not control cross-Application ordering out of the box — they only affect ordering within a single Application's resource tree. Cross-cluster progressive rollout requires the Argo CD ApplicationSet Progressive Syncs feature (alpha/beta, enabled with ARGOCD_APPLICATIONSET_CONTROLLER_ENABLE_PROGRESSIVE_SYNCS=true) combined with a RollingSync strategy. Without Progressive Syncs enabled, all generated Applications sync simultaneously regardless of wave annotations.
App of Apps Pattern
For bootstrapping a cluster's full platform stack (not just one app):
1# Root Application — deploys everything else
2apiVersion: argoproj.io/v1alpha1
3kind: Application
4metadata:
5 name: platform-root
6 namespace: argocd
7spec:
8 project: platform
9 source:
10 repoURL: https://github.com/my-org/fleet-config
11 targetRevision: main
12 path: clusters/prod-us-east-1 # Cluster-specific app manifests
13
14 destination:
15 server: https://kubernetes.default.svc
16 namespace: argocd
17
18 syncPolicy:
19 automated:
20 prune: true
21 selfHeal: trueThe clusters/prod-us-east-1/ directory contains Application manifests for each platform component (cert-manager, external-dns, prometheus, velero, etc.) — all synced by the root Application.
Fleet Management with Flux
Multi-Cluster Bootstrap
Flux's hub-spoke model uses Kustomization with kubeconfig secrets:
1# Bootstrap the management (hub) cluster
2flux bootstrap github \
3 --owner=my-org \
4 --repository=fleet-config \
5 --path=clusters/management \
6 --personal
7
8# On the hub cluster, create kubeconfig secrets for each spoke
9# (these are created by the fleet onboarding automation)
10kubectl create secret generic prod-us-east-1-kubeconfig \
11 --from-file=value=./kubeconfig-prod-us-east-1 \
12 -n flux-systemKustomization with Remote kubeconfig
1# Deploy cert-manager to prod-us-east-1 from the hub cluster
2apiVersion: kustomize.toolkit.fluxcd.io/v1
3kind: Kustomization
4metadata:
5 name: cert-manager-prod-us-east-1
6 namespace: flux-system
7spec:
8 interval: 10m
9 sourceRef:
10 kind: GitRepository
11 name: fleet-config
12
13 path: ./platform/cert-manager/overlays/prod
14
15 kubeConfig:
16 secretRef:
17 name: prod-us-east-1-kubeconfig # Kubeconfig for the spoke cluster
18
19 healthChecks:
20 - apiVersion: apps/v1
21 kind: Deployment
22 name: cert-manager
23 namespace: cert-manager
24 - apiVersion: apps/v1
25 kind: Deployment
26 name: cert-manager-webhook
27 namespace: cert-manager
28
29 timeout: 5mFleet Kustomization Hierarchy
fleet-config/
├── clusters/
│ ├── management/ # Hub cluster bootstrap
│ │ └── flux-system/
│ ├── dev-us-east-1/
│ │ └── kustomization.yaml # Points to platform/ and apps/dev
│ ├── staging-us-east-1/
│ │ └── kustomization.yaml
│ └── prod-us-east-1/
│ └── kustomization.yaml
│
├── platform/ # Platform components (shared)
│ ├── cert-manager/
│ │ ├── base/ # Base HelmRelease
│ │ └── overlays/
│ │ ├── dev/
│ │ ├── staging/
│ │ └── prod/
│ ├── prometheus/
│ ├── velero/
│ └── external-dns/
│
└── apps/ # Application workloads
├── base/
├── dev/
├── staging/
└── prod/
Each cluster's kustomization.yaml patches the platform components for that environment:
1# clusters/prod-us-east-1/kustomization.yaml
2apiVersion: kustomize.config.k8s.io/v1beta1
3kind: Kustomization
4resources:
5 - ../../platform/cert-manager/overlays/prod
6 - ../../platform/prometheus/overlays/prod
7 - ../../platform/velero/overlays/prod
8 - ../../apps/prod
9
10patches:
11 - patch: |
12 - op: replace
13 path: /spec/values/global/clusterName
14 value: prod-us-east-1
15 target:
16 kind: HelmRelease
17 name: prometheus-stackProgressive Delivery with Flux
Flux doesn't have built-in wave ordering, but dependsOn achieves the same effect:
1apiVersion: kustomize.toolkit.fluxcd.io/v1
2kind: Kustomization
3metadata:
4 name: payments-staging
5 namespace: flux-system
6spec:
7 dependsOn:
8 - name: payments-dev # Staging won't reconcile until dev is healthy
9
10 interval: 5m
11 sourceRef:
12 kind: GitRepository
13 name: fleet-config
14 path: ./apps/staging/payments
15 prune: true
16
17---
18apiVersion: kustomize.toolkit.fluxcd.io/v1
19kind: Kustomization
20metadata:
21 name: payments-prod
22 namespace: flux-system
23spec:
24 dependsOn:
25 - name: payments-staging # Prod waits for staging
26
27 interval: 5m
28 sourceRef:
29 kind: GitRepository
30 name: fleet-config
31 path: ./apps/prod/payments
32 prune: trueCluster Inventory and Discovery
For large fleets, cluster discovery needs to be automated:
1# ClusterGroup CRD (Argo CD) — group clusters for bulk operations
2apiVersion: argoproj.io/v1alpha1
3kind: AppProject
4metadata:
5 name: production
6 namespace: argocd
7spec:
8 destinations:
9 - name: "prod-*" # Wildcard — all clusters starting with "prod-"
10 namespace: "*"
11
12 sourceRepos:
13 - https://github.com/my-org/*
14
15 clusterResourceWhitelist:
16 - group: "*"
17 kind: "*"For Flux, cluster secrets can be generated by Terraform/Crossplane when a new cluster is provisioned:
1# Terraform: create kubeconfig secret in flux-system namespace when cluster is ready
2resource "kubernetes_secret" "cluster_kubeconfig" {
3 metadata {
4 name = "${var.cluster_name}-kubeconfig"
5 namespace = "flux-system"
6 }
7
8 data = {
9 value = module.eks.kubeconfig_yaml
10 }
11}Drift Detection Across the Fleet
1# Argo CD: check sync status for all applications
2argocd app list --output wide | grep -v Synced
3
4# Get OutOfSync applications only
5argocd app list --selector tier=production -o json | \
6 jq '.[] | select(.status.sync.status != "Synced") | .metadata.name'
7
8# Flux: check Kustomization health across all clusters
9flux get kustomizations -A --status-selector ready=falseFrequently Asked Questions
Should I use one Git repo per cluster or one repo for the fleet?
One mono-repo for the fleet (what Flux calls "fleet-config") is the recommended pattern for platform components — all clusters share the same base configuration, with overlays for environment differences. Application teams keep their apps in their own repos; the fleet-config repo only references them. This gives central visibility into what's running where without coupling team repo access to platform access.
How do I handle secrets that differ per cluster?
Use External Secrets Operator with a single AWS Secrets Manager path that includes cluster name: my-org/{cluster-name}/database-password. ESO on each cluster reads from its own path. The ESO ClusterSecretStore configuration is deployed via the fleet GitOps — each cluster has the same store config, but the secret paths differ by convention. Avoid baking environment-specific secrets into the Git repo.
What's the right number of clusters?
Fewer is better for operational overhead. Common splits: one cluster per region (not per AZ), one for production and one for non-production per region. Teams that need blast radius isolation (compliance, multi-tenant SaaS) get their own clusters. Resist the impulse to create a cluster for every team — namespace-based multi-tenancy (see Kubernetes Multi-Tenancy Patterns) handles most isolation requirements without the overhead.
For provisioning clusters with Cluster API (CAPI) — declarative cluster lifecycle, ClusterClass templates, and GitOps-driven cluster creation — see Cluster API: Declarative Kubernetes Cluster Lifecycle Management. For the Argo CD ApplicationSet and GitOps details referenced here, see Argo CD for GitOps-Based Kubernetes Deployments. For Flux's Kustomization and HelmRelease primitives, see Flux CD: GitOps for Kubernetes. For platform selection — EKS vs GKE vs AKS trade-offs that inform which managed platform to run a multi-cluster fleet on — see Managed Kubernetes Comparison: EKS vs GKE vs AKS.
Building a GitOps platform for a fleet of 10+ Kubernetes clusters? Talk to us at Coding Protocols — we help platform teams design fleet management architectures that scale without becoming operational nightmares.


