Kubernetes
13 min readMay 6, 2026

Cluster API: Declarative Kubernetes Cluster Lifecycle Management

Cluster API (CAPI) brings Kubernetes cluster lifecycle — provisioning, upgrading, scaling, and deprovisioning — under the same declarative API model used for applications. You declare the desired cluster state as Kubernetes custom resources; CAPI controllers reconcile the actual infrastructure against it. For organizations managing 10+ clusters, this is the alternative to Terraform-per-cluster or manual console operations.

CO
Coding Protocols Team
Platform Engineering
Cluster API: Declarative Kubernetes Cluster Lifecycle Management

Managing a fleet of Kubernetes clusters with Terraform or console operations doesn't scale — every cluster is a bespoke configuration, upgrades require coordination, and there's no continuous reconciliation to catch configuration drift. Cluster API (CAPI, CNCF Graduated) applies the Kubernetes operator pattern to cluster management: you describe the cluster you want as Kubernetes custom resources, and CAPI controllers provision and maintain it.

The architecture has a management cluster (a standard Kubernetes cluster, often EKS) that runs CAPI controllers, and workload clusters (the clusters CAPI manages). All cluster lifecycle operations — create, upgrade, scale, delete — are Kubernetes API calls against the management cluster.


Core Concepts

ObjectDescription
ClusterDefines a workload cluster (control plane + infrastructure)
MachineDeploymentManages a pool of worker nodes (like a Deployment for Machines)
MachineRepresents a single node (control plane or worker)
KubeadmControlPlaneConfigures kubeadm-based control plane
AWSClusterAWS-specific infrastructure for the cluster (VPC, subnets, security groups)
AWSMachineTemplateAWS EC2 configuration for nodes (AMI, instance type, security groups)

The CAPI core controllers (cluster-api) are provider-agnostic. Infrastructure providers (CAPA for AWS, CAPZ for Azure, CAPG for GCP) handle cloud-specific resource provisioning.


Management Cluster Setup

bash
1# Install clusterctl
2curl -L https://github.com/kubernetes-sigs/cluster-api/releases/latest/download/clusterctl-linux-amd64 \
3  -o clusterctl && chmod +x clusterctl && mv clusterctl /usr/local/bin/
4
5# Initialize the management cluster with the AWS provider (CAPA)
6export AWS_REGION=us-east-1
7export AWS_ACCESS_KEY_ID=<your-access-key>
8export AWS_SECRET_ACCESS_KEY=<your-secret-key>
9export AWS_B64ENCODED_CREDENTIALS=$(clusterawsadm bootstrap credentials encode-as-profile)
10
11# Bootstrap IAM resources required by CAPA
12clusterawsadm bootstrap iam create-cloudformation-stack
13
14# Initialize management cluster with AWS provider
15clusterctl init --infrastructure aws

For production management clusters, use IAM roles instead of access keys. Run the management cluster on EKS with appropriate IAM permissions for CAPA:

bash
# With EKS Pod Identity for the CAPA controller ServiceAccounts
clusterctl init --infrastructure aws \
  --config clusterctl-config.yaml   # Contains provider versions and overrides

Generating a Cluster Template

clusterctl generate cluster creates a YAML manifest from a template:

bash
1export AWS_REGION=us-east-1
2export AWS_SSH_KEY_NAME=my-key-pair
3export AWS_CONTROL_PLANE_MACHINE_TYPE=m5.xlarge
4export AWS_NODE_MACHINE_TYPE=m5.2xlarge
5export KUBERNETES_VERSION=v1.33.0
6export CONTROL_PLANE_MACHINE_COUNT=3
7export WORKER_MACHINE_COUNT=5
8
9clusterctl generate cluster payments-cluster \
10  --kubernetes-version ${KUBERNETES_VERSION} \
11  --control-plane-machine-count ${CONTROL_PLANE_MACHINE_COUNT} \
12  --worker-machine-count ${WORKER_MACHINE_COUNT} \
13  > payments-cluster.yaml

This generates all required objects: Cluster, AWSCluster, KubeadmControlPlane, AWSMachineTemplate for control plane, MachineDeployment, AWSMachineTemplate for workers.


Cluster Manifest

The generated manifest has this structure:

yaml
1apiVersion: cluster.x-k8s.io/v1beta1
2kind: Cluster
3metadata:
4  name: payments-cluster
5  namespace: default
6spec:
7  clusterNetwork:
8    pods:
9      cidrBlocks: ["192.168.0.0/16"]
10  controlPlaneRef:
11    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
12    kind: KubeadmControlPlane
13    name: payments-cluster-control-plane
14  infrastructureRef:
15    apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
16    kind: AWSCluster
17    name: payments-cluster
18---
19apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
20kind: AWSCluster
21metadata:
22  name: payments-cluster
23  namespace: default
24spec:
25  region: us-east-1
26  sshKeyName: my-key-pair
27  network:
28    vpc:
29      availabilityZoneUsageLimit: 3
30    subnets:
31      - availabilityZone: us-east-1a
32        cidrBlock: "10.0.0.0/24"
33        isPublic: false    # Private subnets for nodes
34      - availabilityZone: us-east-1b
35        cidrBlock: "10.0.1.0/24"
36        isPublic: false
37      - availabilityZone: us-east-1c
38        cidrBlock: "10.0.2.0/24"
39        isPublic: false
40---
41apiVersion: controlplane.cluster.x-k8s.io/v1beta1
42kind: KubeadmControlPlane
43metadata:
44  name: payments-cluster-control-plane
45  namespace: default
46spec:
47  replicas: 3
48  version: v1.33.0
49  machineTemplate:
50    infrastructureRef:
51      apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
52      kind: AWSMachineTemplate
53      name: payments-cluster-control-plane
54  kubeadmConfigSpec:
55    initConfiguration:
56      nodeRegistration:
57        name: "{{ ds.meta_data.local_hostname }}"
58    clusterConfiguration:
59      apiServer:
60        extraArgs:
61          audit-log-path: /var/log/audit.log
62          audit-log-maxage: "30"
63---
64apiVersion: cluster.x-k8s.io/v1beta1
65kind: MachineDeployment
66metadata:
67  name: payments-cluster-md-0
68  namespace: default
69spec:
70  clusterName: payments-cluster
71  replicas: 5
72  selector:
73    matchLabels:
74      cluster.x-k8s.io/cluster-name: payments-cluster
75  template:
76    spec:
77      bootstrap:
78        configRef:
79          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
80          kind: KubeadmConfigTemplate
81          name: payments-cluster-md-0
82      clusterName: payments-cluster
83      infrastructureRef:
84        apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
85        kind: AWSMachineTemplate
86        name: payments-cluster-md-0
87      version: v1.33.0

Cluster Operations

bash
1# Apply the cluster manifest
2kubectl apply -f payments-cluster.yaml
3
4# Watch cluster provisioning
5clusterctl describe cluster payments-cluster
6
7# Get kubeconfig for the workload cluster
8clusterctl get kubeconfig payments-cluster > payments-cluster.kubeconfig
9export KUBECONFIG=payments-cluster.kubeconfig
10
11# Install CNI (required before nodes become Ready)
12kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.29.2/manifests/calico.yaml

Scaling Worker Nodes

bash
# Scale workers by patching MachineDeployment replicas
kubectl scale machinedeployment payments-cluster-md-0 --replicas=8

Or declaratively:

yaml
spec:
  replicas: 8    # Was 5 — CAPI reconciles the actual instance count

Upgrading the Cluster

bash
1# Upgrade control plane first
2kubectl patch kubeadmcontrolplane payments-cluster-control-plane \
3  --type merge \
4  --patch '{"spec": {"version": "v1.34.0"}}'
5
6# After control plane is upgraded, upgrade worker nodes
7kubectl patch machinedeployment payments-cluster-md-0 \
8  --type merge \
9  --patch '{"spec": {"template": {"spec": {"version": "v1.34.0"}}}}'

CAPI performs rolling upgrades: old Machines are replaced one at a time with new ones running the target version, respecting the MachineDeployment's maxUnavailable setting.


ClusterClass: Fleet-Scale Cluster Templates

ClusterClass (CAPI v1.4+ GA): For fleet-scale management, ClusterClass provides topology-based cluster templates — define once, instantiate many clusters with --worker-machine-count and --control-plane-machine-count parameters, without duplicating manifests per cluster.

yaml
1apiVersion: cluster.x-k8s.io/v1beta1
2kind: ClusterClass
3metadata:
4  name: eks-production-class
5  namespace: default
6spec:
7  controlPlane:
8    ref:
9      apiVersion: controlplane.cluster.x-k8s.io/v1beta1
10      kind: KubeadmControlPlaneTemplate
11      name: eks-production-cp-template
12  infrastructure:
13    ref:
14      apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
15      kind: AWSClusterTemplate
16      name: eks-production-infra-template
17  workers:
18    machineDeployments:
19      - class: default-worker
20        template:
21          bootstrap:
22            ref:
23              apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
24              kind: KubeadmConfigTemplate
25              name: eks-production-worker-template
26          infrastructure:
27            ref:
28              apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
29              kind: AWSMachineTemplate
30              name: eks-production-worker-template

With a ClusterClass defined, new clusters reference the class topology instead of duplicating all the underlying templates:

yaml
1apiVersion: cluster.x-k8s.io/v1beta1
2kind: Cluster
3metadata:
4  name: analytics-cluster
5  namespace: default
6spec:
7  topology:
8    class: eks-production-class    # Reference the ClusterClass
9    version: v1.33.0
10    controlPlane:
11      replicas: 3
12    workers:
13      machineDeployments:
14        - class: default-worker
15          name: workers
16          replicas: 5

This reduces each new cluster to a simple Cluster object that references the shared topology — far less duplication than generating full manifests per cluster. Version upgrades and configuration changes to the ClusterClass propagate to all clusters that reference it.


GitOps Integration

With Flux or Argo CD managing the management cluster, cluster lifecycle is fully GitOps:

Git repo contains:
├── clusters/
│   ├── payments-cluster/
│   │   ├── cluster.yaml          # Cluster, AWSCluster
│   │   ├── control-plane.yaml    # KubeadmControlPlane, AWSMachineTemplate
│   │   └── workers.yaml          # MachineDeployment, AWSMachineTemplate
│   └── analytics-cluster/
│       └── ...
└── kustomization.yaml            # Flux Kustomization pointing at clusters/

# Upgrade = PR that changes version field in control-plane.yaml
# Scale = PR that changes replicas in workers.yaml
# New cluster = PR adding a new directory under clusters/
yaml
1# Flux Kustomization watching the management cluster's cluster definitions
2apiVersion: kustomize.toolkit.fluxcd.io/v1
3kind: Kustomization
4metadata:
5  name: cluster-definitions
6  namespace: flux-system
7spec:
8  interval: 5m
9  path: "./clusters"
10  sourceRef:
11    kind: GitRepository
12    name: fleet-infra
13  prune: true    # Delete cluster objects when removed from Git

Frequently Asked Questions

When should I use Cluster API vs Terraform for cluster management?

Terraform is better for initial provisioning of a small number of clusters where fine-grained control over every AWS resource is needed. CAPI is better for managing a fleet of similar clusters at scale — upgrades, scaling, and new cluster creation are Kubernetes API calls rather than Terraform plans. The tradeoff: CAPI adds a management cluster dependency; Terraform has no such dependency. For organizations managing 10+ clusters with frequent lifecycle operations, CAPI's operational model is significantly simpler.

Does CAPI work with EKS?

There's a separate EKS provider (CAPA EKS mode) that provisions EKS clusters via the EKS API rather than kubeadm. This gives you CAPI's declarative model with AWS-managed control planes. The control plane object is AWSManagedControlPlane instead of KubeadmControlPlane. EKS mode is the recommended approach for AWS — it eliminates the operational burden of managing kubeadm control planes.


For multi-cluster fleet management using GitOps with Argo CD and Flux — hub-spoke topology, ApplicationSet generators, and progressive rollout across environments — see Multi-Cluster Kubernetes: Fleet Management with Flux and Argo CD. For Flux-based GitOps that manages CAPI cluster manifests, see Flux CD: GitOps for Kubernetes. For Terraform-based EKS cluster management as an alternative approach, see Terraform for EKS Infrastructure. For namespace-based multi-tenancy patterns (Capsule, HNC, virtual clusters) when a single cluster serves multiple teams, see Kubernetes Multi-Tenancy Patterns.

Managing a fleet of Kubernetes clusters? Talk to us at Coding Protocols — we help platform teams evaluate and implement cluster fleet management strategies that scale with organizational growth.

Related Topics

Cluster API
CAPI
Kubernetes
Infrastructure
AWS
Platform Engineering
Multi-Cluster
GitOps

Read Next