Platform Engineering
14 min readMay 6, 2026

Multi-Cluster Kubernetes: Fleet Management with Flux and Argo CD

A single EKS cluster has practical limits: regulatory requirements isolate prod from dev, regional latency requires clusters per region, and blast radius containment means you don't want a bad deployment to take down all environments. Managing 10+ clusters requires GitOps fleet management — a pattern where a control plane cluster applies consistent configuration across all workload clusters from a single Git source of truth.

CO
Coding Protocols Team
Platform Engineering
Multi-Cluster Kubernetes: Fleet Management with Flux and Argo CD

Running a single cluster per environment (dev, staging, prod) becomes a fleet once you add regional clusters (us-east-1, eu-west-1, ap-southeast-1) or tenant clusters (one per business unit or compliance boundary). At 5+ clusters, manual configuration management — SSHing into each cluster to apply changes — becomes untenable. Fleet management is the operational model where a control plane (hub) pushes consistent configuration to all clusters (spokes) from a single Git repository.

Two patterns dominate: Flux with its hub-spoke bootstrap and Kustomization dependency model, and Argo CD with ApplicationSet generators that enumerate clusters from labels or cluster inventory. Both work. The choice is largely about whether you prefer Flux's pull-everywhere model or Argo CD's centralized visibility.


Fleet Topology

The hub-spoke model:

Control Plane Cluster (hub)
├── Runs: Argo CD / Flux management plane
├── Holds: Cluster secrets, fleet config repo access
└── Pushes to:
    ├── dev-us-east-1 (spoke)
    ├── staging-us-east-1 (spoke)
    ├── prod-us-east-1 (spoke)
    ├── prod-eu-west-1 (spoke)
    └── prod-ap-southeast-1 (spoke)

Each spoke cluster runs only workloads — the GitOps agent on the hub manages configuration. This means spoke cluster breakdowns don't take down the control plane.


Fleet Management with Argo CD

Registering Clusters

bash
1# Register spoke clusters with Argo CD (--label flag is not supported by argocd cluster add)
2argocd cluster add arn:aws:eks:us-east-1:123456789:cluster/prod-us-east-1 \
3  --name prod-us-east-1
4
5argocd cluster add arn:aws:eks:eu-west-1:123456789:cluster/prod-eu-west-1 \
6  --name prod-eu-west-1
7
8argocd cluster add arn:aws:eks:us-east-1:123456789:cluster/staging-us-east-1 \
9  --name staging-us-east-1
10
11# Set labels after registration with argocd cluster set
12argocd cluster set prod-us-east-1 --label region=us-east-1 --label env=prod --label tier=production
13argocd cluster set prod-eu-west-1 --label region=eu-west-1 --label env=prod --label tier=production
14argocd cluster set staging-us-east-1 --label region=us-east-1 --label env=staging --label tier=non-production

ApplicationSet: Deploy to All Clusters by Label

ApplicationSet generates Application resources dynamically from a generator:

yaml
1# Deploy Prometheus stack to all production clusters
2apiVersion: argoproj.io/v1alpha1
3kind: ApplicationSet
4metadata:
5  name: prometheus-fleet
6  namespace: argocd
7spec:
8  generators:
9    - clusters:
10        selector:
11          matchLabels:
12            tier: production    # Only production clusters
13
14  template:
15    metadata:
16      name: "prometheus-{{name}}"    # {{name}} = cluster name
17    spec:
18      project: platform
19
20      source:
21        repoURL: https://github.com/my-org/fleet-config
22        targetRevision: main
23        path: "platform/prometheus/overlays/{{metadata.labels.env}}"    # env-specific overlay
24
25      destination:
26        server: "{{server}}"    # Cluster API endpoint (injected by generator)
27        namespace: monitoring
28
29      syncPolicy:
30        automated:
31          prune: true
32          selfHeal: true
33        syncOptions:
34          - CreateNamespace=true
35
36      ignoreDifferences:
37        - group: apps
38          kind: Deployment
39          jsonPointers:
40            - /spec/replicas    # Don't revert HPA-managed replica counts

Progressive Rollout Across Environments

Deploy to dev first, then staging, then prod with sync waves:

yaml
1apiVersion: argoproj.io/v1alpha1
2kind: ApplicationSet
3metadata:
4  name: payments-api-fleet
5  namespace: argocd
6spec:
7  generators:
8    - list:
9        elements:
10          - cluster: dev-us-east-1
11            env: dev
12            wave: "1"
13          - cluster: staging-us-east-1
14            env: staging
15            wave: "2"
16          - cluster: prod-us-east-1
17            env: prod
18            wave: "3"
19          - cluster: prod-eu-west-1
20            env: prod
21            wave: "4"
22
23  template:
24    metadata:
25      name: "payments-api-{{cluster}}"
26      annotations:
27        argocd.argoproj.io/sync-wave: "{{wave}}"    # Wave ordering
28
29    spec:
30      project: payments-team
31
32      source:
33        repoURL: https://github.com/my-org/payments-api
34        targetRevision: main
35        path: "k8s/overlays/{{env}}"
36
37      destination:
38        name: "{{cluster}}"    # Cluster by registered name
39        namespace: payments
40
41      syncPolicy:
42        automated:
43          prune: true
44          selfHeal: true

Note: Sync-wave annotations on Application templates within an ApplicationSet do not control cross-Application ordering out of the box — they only affect ordering within a single Application's resource tree. Cross-cluster progressive rollout requires the Argo CD ApplicationSet Progressive Syncs feature (alpha/beta, enabled with ARGOCD_APPLICATIONSET_CONTROLLER_ENABLE_PROGRESSIVE_SYNCS=true) combined with a RollingSync strategy. Without Progressive Syncs enabled, all generated Applications sync simultaneously regardless of wave annotations.

App of Apps Pattern

For bootstrapping a cluster's full platform stack (not just one app):

yaml
1# Root Application — deploys everything else
2apiVersion: argoproj.io/v1alpha1
3kind: Application
4metadata:
5  name: platform-root
6  namespace: argocd
7spec:
8  project: platform
9  source:
10    repoURL: https://github.com/my-org/fleet-config
11    targetRevision: main
12    path: clusters/prod-us-east-1    # Cluster-specific app manifests
13
14  destination:
15    server: https://kubernetes.default.svc
16    namespace: argocd
17
18  syncPolicy:
19    automated:
20      prune: true
21      selfHeal: true

The clusters/prod-us-east-1/ directory contains Application manifests for each platform component (cert-manager, external-dns, prometheus, velero, etc.) — all synced by the root Application.


Fleet Management with Flux

Multi-Cluster Bootstrap

Flux's hub-spoke model uses Kustomization with kubeconfig secrets:

bash
1# Bootstrap the management (hub) cluster
2flux bootstrap github \
3  --owner=my-org \
4  --repository=fleet-config \
5  --path=clusters/management \
6  --personal
7
8# On the hub cluster, create kubeconfig secrets for each spoke
9# (these are created by the fleet onboarding automation)
10kubectl create secret generic prod-us-east-1-kubeconfig \
11  --from-file=value=./kubeconfig-prod-us-east-1 \
12  -n flux-system

Kustomization with Remote kubeconfig

yaml
1# Deploy cert-manager to prod-us-east-1 from the hub cluster
2apiVersion: kustomize.toolkit.fluxcd.io/v1
3kind: Kustomization
4metadata:
5  name: cert-manager-prod-us-east-1
6  namespace: flux-system
7spec:
8  interval: 10m
9  sourceRef:
10    kind: GitRepository
11    name: fleet-config
12
13  path: ./platform/cert-manager/overlays/prod
14
15  kubeConfig:
16    secretRef:
17      name: prod-us-east-1-kubeconfig    # Kubeconfig for the spoke cluster
18
19  healthChecks:
20    - apiVersion: apps/v1
21      kind: Deployment
22      name: cert-manager
23      namespace: cert-manager
24    - apiVersion: apps/v1
25      kind: Deployment
26      name: cert-manager-webhook
27      namespace: cert-manager
28
29  timeout: 5m

Fleet Kustomization Hierarchy

fleet-config/
├── clusters/
│   ├── management/                    # Hub cluster bootstrap
│   │   └── flux-system/
│   ├── dev-us-east-1/
│   │   └── kustomization.yaml         # Points to platform/ and apps/dev
│   ├── staging-us-east-1/
│   │   └── kustomization.yaml
│   └── prod-us-east-1/
│       └── kustomization.yaml
│
├── platform/                          # Platform components (shared)
│   ├── cert-manager/
│   │   ├── base/                      # Base HelmRelease
│   │   └── overlays/
│   │       ├── dev/
│   │       ├── staging/
│   │       └── prod/
│   ├── prometheus/
│   ├── velero/
│   └── external-dns/
│
└── apps/                              # Application workloads
    ├── base/
    ├── dev/
    ├── staging/
    └── prod/

Each cluster's kustomization.yaml patches the platform components for that environment:

yaml
1# clusters/prod-us-east-1/kustomization.yaml
2apiVersion: kustomize.config.k8s.io/v1beta1
3kind: Kustomization
4resources:
5  - ../../platform/cert-manager/overlays/prod
6  - ../../platform/prometheus/overlays/prod
7  - ../../platform/velero/overlays/prod
8  - ../../apps/prod
9
10patches:
11  - patch: |
12      - op: replace
13        path: /spec/values/global/clusterName
14        value: prod-us-east-1
15    target:
16      kind: HelmRelease
17      name: prometheus-stack

Progressive Delivery with Flux

Flux doesn't have built-in wave ordering, but dependsOn achieves the same effect:

yaml
1apiVersion: kustomize.toolkit.fluxcd.io/v1
2kind: Kustomization
3metadata:
4  name: payments-staging
5  namespace: flux-system
6spec:
7  dependsOn:
8    - name: payments-dev    # Staging won't reconcile until dev is healthy
9
10  interval: 5m
11  sourceRef:
12    kind: GitRepository
13    name: fleet-config
14  path: ./apps/staging/payments
15  prune: true
16
17---
18apiVersion: kustomize.toolkit.fluxcd.io/v1
19kind: Kustomization
20metadata:
21  name: payments-prod
22  namespace: flux-system
23spec:
24  dependsOn:
25    - name: payments-staging    # Prod waits for staging
26
27  interval: 5m
28  sourceRef:
29    kind: GitRepository
30    name: fleet-config
31  path: ./apps/prod/payments
32  prune: true

Cluster Inventory and Discovery

For large fleets, cluster discovery needs to be automated:

yaml
1# ClusterGroup CRD (Argo CD) — group clusters for bulk operations
2apiVersion: argoproj.io/v1alpha1
3kind: AppProject
4metadata:
5  name: production
6  namespace: argocd
7spec:
8  destinations:
9    - name: "prod-*"    # Wildcard — all clusters starting with "prod-"
10      namespace: "*"
11
12  sourceRepos:
13    - https://github.com/my-org/*
14
15  clusterResourceWhitelist:
16    - group: "*"
17      kind: "*"

For Flux, cluster secrets can be generated by Terraform/Crossplane when a new cluster is provisioned:

hcl
1# Terraform: create kubeconfig secret in flux-system namespace when cluster is ready
2resource "kubernetes_secret" "cluster_kubeconfig" {
3  metadata {
4    name      = "${var.cluster_name}-kubeconfig"
5    namespace = "flux-system"
6  }
7
8  data = {
9    value = module.eks.kubeconfig_yaml
10  }
11}

Drift Detection Across the Fleet

bash
1# Argo CD: check sync status for all applications
2argocd app list --output wide | grep -v Synced
3
4# Get OutOfSync applications only
5argocd app list --selector tier=production -o json | \
6  jq '.[] | select(.status.sync.status != "Synced") | .metadata.name'
7
8# Flux: check Kustomization health across all clusters
9flux get kustomizations -A --status-selector ready=false

Frequently Asked Questions

Should I use one Git repo per cluster or one repo for the fleet?

One mono-repo for the fleet (what Flux calls "fleet-config") is the recommended pattern for platform components — all clusters share the same base configuration, with overlays for environment differences. Application teams keep their apps in their own repos; the fleet-config repo only references them. This gives central visibility into what's running where without coupling team repo access to platform access.

How do I handle secrets that differ per cluster?

Use External Secrets Operator with a single AWS Secrets Manager path that includes cluster name: my-org/{cluster-name}/database-password. ESO on each cluster reads from its own path. The ESO ClusterSecretStore configuration is deployed via the fleet GitOps — each cluster has the same store config, but the secret paths differ by convention. Avoid baking environment-specific secrets into the Git repo.

What's the right number of clusters?

Fewer is better for operational overhead. Common splits: one cluster per region (not per AZ), one for production and one for non-production per region. Teams that need blast radius isolation (compliance, multi-tenant SaaS) get their own clusters. Resist the impulse to create a cluster for every team — namespace-based multi-tenancy (see Kubernetes Multi-Tenancy Patterns) handles most isolation requirements without the overhead.


For provisioning clusters with Cluster API (CAPI) — declarative cluster lifecycle, ClusterClass templates, and GitOps-driven cluster creation — see Cluster API: Declarative Kubernetes Cluster Lifecycle Management. For the Argo CD ApplicationSet and GitOps details referenced here, see Argo CD for GitOps-Based Kubernetes Deployments. For Flux's Kustomization and HelmRelease primitives, see Flux CD: GitOps for Kubernetes. For platform selection — EKS vs GKE vs AKS trade-offs that inform which managed platform to run a multi-cluster fleet on — see Managed Kubernetes Comparison: EKS vs GKE vs AKS.

Building a GitOps platform for a fleet of 10+ Kubernetes clusters? Talk to us at Coding Protocols — we help platform teams design fleet management architectures that scale without becoming operational nightmares.

Related Topics

Multi-Cluster
Kubernetes
Fleet Management
Flux
Argo CD
GitOps
Platform Engineering
EKS

Read Next