Cluster API: Declarative Kubernetes Cluster Lifecycle Management
Cluster API (CAPI) brings Kubernetes cluster lifecycle — provisioning, upgrading, scaling, and deprovisioning — under the same declarative API model used for applications. You declare the desired cluster state as Kubernetes custom resources; CAPI controllers reconcile the actual infrastructure against it. For organizations managing 10+ clusters, this is the alternative to Terraform-per-cluster or manual console operations.

Managing a fleet of Kubernetes clusters with Terraform or console operations doesn't scale — every cluster is a bespoke configuration, upgrades require coordination, and there's no continuous reconciliation to catch configuration drift. Cluster API (CAPI, CNCF Graduated) applies the Kubernetes operator pattern to cluster management: you describe the cluster you want as Kubernetes custom resources, and CAPI controllers provision and maintain it.
The architecture has a management cluster (a standard Kubernetes cluster, often EKS) that runs CAPI controllers, and workload clusters (the clusters CAPI manages). All cluster lifecycle operations — create, upgrade, scale, delete — are Kubernetes API calls against the management cluster.
Core Concepts
| Object | Description |
|---|---|
Cluster | Defines a workload cluster (control plane + infrastructure) |
MachineDeployment | Manages a pool of worker nodes (like a Deployment for Machines) |
Machine | Represents a single node (control plane or worker) |
KubeadmControlPlane | Configures kubeadm-based control plane |
AWSCluster | AWS-specific infrastructure for the cluster (VPC, subnets, security groups) |
AWSMachineTemplate | AWS EC2 configuration for nodes (AMI, instance type, security groups) |
The CAPI core controllers (cluster-api) are provider-agnostic. Infrastructure providers (CAPA for AWS, CAPZ for Azure, CAPG for GCP) handle cloud-specific resource provisioning.
Management Cluster Setup
1# Install clusterctl
2curl -L https://github.com/kubernetes-sigs/cluster-api/releases/latest/download/clusterctl-linux-amd64 \
3 -o clusterctl && chmod +x clusterctl && mv clusterctl /usr/local/bin/
4
5# Initialize the management cluster with the AWS provider (CAPA)
6export AWS_REGION=us-east-1
7export AWS_ACCESS_KEY_ID=<your-access-key>
8export AWS_SECRET_ACCESS_KEY=<your-secret-key>
9export AWS_B64ENCODED_CREDENTIALS=$(clusterawsadm bootstrap credentials encode-as-profile)
10
11# Bootstrap IAM resources required by CAPA
12clusterawsadm bootstrap iam create-cloudformation-stack
13
14# Initialize management cluster with AWS provider
15clusterctl init --infrastructure awsFor production management clusters, use IAM roles instead of access keys. Run the management cluster on EKS with appropriate IAM permissions for CAPA:
# With EKS Pod Identity for the CAPA controller ServiceAccounts
clusterctl init --infrastructure aws \
--config clusterctl-config.yaml # Contains provider versions and overridesGenerating a Cluster Template
clusterctl generate cluster creates a YAML manifest from a template:
1export AWS_REGION=us-east-1
2export AWS_SSH_KEY_NAME=my-key-pair
3export AWS_CONTROL_PLANE_MACHINE_TYPE=m5.xlarge
4export AWS_NODE_MACHINE_TYPE=m5.2xlarge
5export KUBERNETES_VERSION=v1.33.0
6export CONTROL_PLANE_MACHINE_COUNT=3
7export WORKER_MACHINE_COUNT=5
8
9clusterctl generate cluster payments-cluster \
10 --kubernetes-version ${KUBERNETES_VERSION} \
11 --control-plane-machine-count ${CONTROL_PLANE_MACHINE_COUNT} \
12 --worker-machine-count ${WORKER_MACHINE_COUNT} \
13 > payments-cluster.yamlThis generates all required objects: Cluster, AWSCluster, KubeadmControlPlane, AWSMachineTemplate for control plane, MachineDeployment, AWSMachineTemplate for workers.
Cluster Manifest
The generated manifest has this structure:
1apiVersion: cluster.x-k8s.io/v1beta1
2kind: Cluster
3metadata:
4 name: payments-cluster
5 namespace: default
6spec:
7 clusterNetwork:
8 pods:
9 cidrBlocks: ["192.168.0.0/16"]
10 controlPlaneRef:
11 apiVersion: controlplane.cluster.x-k8s.io/v1beta1
12 kind: KubeadmControlPlane
13 name: payments-cluster-control-plane
14 infrastructureRef:
15 apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
16 kind: AWSCluster
17 name: payments-cluster
18---
19apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
20kind: AWSCluster
21metadata:
22 name: payments-cluster
23 namespace: default
24spec:
25 region: us-east-1
26 sshKeyName: my-key-pair
27 network:
28 vpc:
29 availabilityZoneUsageLimit: 3
30 subnets:
31 - availabilityZone: us-east-1a
32 cidrBlock: "10.0.0.0/24"
33 isPublic: false # Private subnets for nodes
34 - availabilityZone: us-east-1b
35 cidrBlock: "10.0.1.0/24"
36 isPublic: false
37 - availabilityZone: us-east-1c
38 cidrBlock: "10.0.2.0/24"
39 isPublic: false
40---
41apiVersion: controlplane.cluster.x-k8s.io/v1beta1
42kind: KubeadmControlPlane
43metadata:
44 name: payments-cluster-control-plane
45 namespace: default
46spec:
47 replicas: 3
48 version: v1.33.0
49 machineTemplate:
50 infrastructureRef:
51 apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
52 kind: AWSMachineTemplate
53 name: payments-cluster-control-plane
54 kubeadmConfigSpec:
55 initConfiguration:
56 nodeRegistration:
57 name: "{{ ds.meta_data.local_hostname }}"
58 clusterConfiguration:
59 apiServer:
60 extraArgs:
61 audit-log-path: /var/log/audit.log
62 audit-log-maxage: "30"
63---
64apiVersion: cluster.x-k8s.io/v1beta1
65kind: MachineDeployment
66metadata:
67 name: payments-cluster-md-0
68 namespace: default
69spec:
70 clusterName: payments-cluster
71 replicas: 5
72 selector:
73 matchLabels:
74 cluster.x-k8s.io/cluster-name: payments-cluster
75 template:
76 spec:
77 bootstrap:
78 configRef:
79 apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
80 kind: KubeadmConfigTemplate
81 name: payments-cluster-md-0
82 clusterName: payments-cluster
83 infrastructureRef:
84 apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
85 kind: AWSMachineTemplate
86 name: payments-cluster-md-0
87 version: v1.33.0Cluster Operations
1# Apply the cluster manifest
2kubectl apply -f payments-cluster.yaml
3
4# Watch cluster provisioning
5clusterctl describe cluster payments-cluster
6
7# Get kubeconfig for the workload cluster
8clusterctl get kubeconfig payments-cluster > payments-cluster.kubeconfig
9export KUBECONFIG=payments-cluster.kubeconfig
10
11# Install CNI (required before nodes become Ready)
12kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.29.2/manifests/calico.yamlScaling Worker Nodes
# Scale workers by patching MachineDeployment replicas
kubectl scale machinedeployment payments-cluster-md-0 --replicas=8Or declaratively:
spec:
replicas: 8 # Was 5 — CAPI reconciles the actual instance countUpgrading the Cluster
1# Upgrade control plane first
2kubectl patch kubeadmcontrolplane payments-cluster-control-plane \
3 --type merge \
4 --patch '{"spec": {"version": "v1.34.0"}}'
5
6# After control plane is upgraded, upgrade worker nodes
7kubectl patch machinedeployment payments-cluster-md-0 \
8 --type merge \
9 --patch '{"spec": {"template": {"spec": {"version": "v1.34.0"}}}}'CAPI performs rolling upgrades: old Machines are replaced one at a time with new ones running the target version, respecting the MachineDeployment's maxUnavailable setting.
ClusterClass: Fleet-Scale Cluster Templates
ClusterClass (CAPI v1.4+ GA): For fleet-scale management, ClusterClass provides topology-based cluster templates — define once, instantiate many clusters with --worker-machine-count and --control-plane-machine-count parameters, without duplicating manifests per cluster.
1apiVersion: cluster.x-k8s.io/v1beta1
2kind: ClusterClass
3metadata:
4 name: eks-production-class
5 namespace: default
6spec:
7 controlPlane:
8 ref:
9 apiVersion: controlplane.cluster.x-k8s.io/v1beta1
10 kind: KubeadmControlPlaneTemplate
11 name: eks-production-cp-template
12 infrastructure:
13 ref:
14 apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
15 kind: AWSClusterTemplate
16 name: eks-production-infra-template
17 workers:
18 machineDeployments:
19 - class: default-worker
20 template:
21 bootstrap:
22 ref:
23 apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
24 kind: KubeadmConfigTemplate
25 name: eks-production-worker-template
26 infrastructure:
27 ref:
28 apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
29 kind: AWSMachineTemplate
30 name: eks-production-worker-templateWith a ClusterClass defined, new clusters reference the class topology instead of duplicating all the underlying templates:
1apiVersion: cluster.x-k8s.io/v1beta1
2kind: Cluster
3metadata:
4 name: analytics-cluster
5 namespace: default
6spec:
7 topology:
8 class: eks-production-class # Reference the ClusterClass
9 version: v1.33.0
10 controlPlane:
11 replicas: 3
12 workers:
13 machineDeployments:
14 - class: default-worker
15 name: workers
16 replicas: 5This reduces each new cluster to a simple Cluster object that references the shared topology — far less duplication than generating full manifests per cluster. Version upgrades and configuration changes to the ClusterClass propagate to all clusters that reference it.
GitOps Integration
With Flux or Argo CD managing the management cluster, cluster lifecycle is fully GitOps:
Git repo contains:
├── clusters/
│ ├── payments-cluster/
│ │ ├── cluster.yaml # Cluster, AWSCluster
│ │ ├── control-plane.yaml # KubeadmControlPlane, AWSMachineTemplate
│ │ └── workers.yaml # MachineDeployment, AWSMachineTemplate
│ └── analytics-cluster/
│ └── ...
└── kustomization.yaml # Flux Kustomization pointing at clusters/
# Upgrade = PR that changes version field in control-plane.yaml
# Scale = PR that changes replicas in workers.yaml
# New cluster = PR adding a new directory under clusters/
1# Flux Kustomization watching the management cluster's cluster definitions
2apiVersion: kustomize.toolkit.fluxcd.io/v1
3kind: Kustomization
4metadata:
5 name: cluster-definitions
6 namespace: flux-system
7spec:
8 interval: 5m
9 path: "./clusters"
10 sourceRef:
11 kind: GitRepository
12 name: fleet-infra
13 prune: true # Delete cluster objects when removed from GitFrequently Asked Questions
When should I use Cluster API vs Terraform for cluster management?
Terraform is better for initial provisioning of a small number of clusters where fine-grained control over every AWS resource is needed. CAPI is better for managing a fleet of similar clusters at scale — upgrades, scaling, and new cluster creation are Kubernetes API calls rather than Terraform plans. The tradeoff: CAPI adds a management cluster dependency; Terraform has no such dependency. For organizations managing 10+ clusters with frequent lifecycle operations, CAPI's operational model is significantly simpler.
Does CAPI work with EKS?
There's a separate EKS provider (CAPA EKS mode) that provisions EKS clusters via the EKS API rather than kubeadm. This gives you CAPI's declarative model with AWS-managed control planes. The control plane object is AWSManagedControlPlane instead of KubeadmControlPlane. EKS mode is the recommended approach for AWS — it eliminates the operational burden of managing kubeadm control planes.
For multi-cluster fleet management using GitOps with Argo CD and Flux — hub-spoke topology, ApplicationSet generators, and progressive rollout across environments — see Multi-Cluster Kubernetes: Fleet Management with Flux and Argo CD. For Flux-based GitOps that manages CAPI cluster manifests, see Flux CD: GitOps for Kubernetes. For Terraform-based EKS cluster management as an alternative approach, see Terraform for EKS Infrastructure. For namespace-based multi-tenancy patterns (Capsule, HNC, virtual clusters) when a single cluster serves multiple teams, see Kubernetes Multi-Tenancy Patterns.
Managing a fleet of Kubernetes clusters? Talk to us at Coding Protocols — we help platform teams evaluate and implement cluster fleet management strategies that scale with organizational growth.


