Kubernetes
14 min readMay 9, 2026

Kubernetes Operators: Building Controllers with Kubebuilder

A Kubernetes operator is a controller that knows how to manage a specific application — encoding operational knowledge (how to scale, back up, upgrade, and recover) into the reconciliation loop. Kubebuilder scaffolds the boilerplate so you focus on the domain logic: defining a Custom Resource Definition and writing a reconciler in Go.

CO
Coding Protocols Team
Platform Engineering
Kubernetes Operators: Building Controllers with Kubebuilder

Kubernetes operators extend the Kubernetes API with domain-specific resources and controllers. Instead of documenting how to deploy a database (scale the StatefulSet, take a PVC snapshot before upgrades, re-elect the primary after a failure), you encode that knowledge in a controller that Kubernetes continuously runs. The result: humans declare what they want; the operator figures out how to achieve it.

Kubebuilder (maintained by the Kubernetes SIGs) is the standard scaffolding tool for building operators. It generates the boilerplate (CRD manifests, RBAC markers, webhook scaffolding, test setup) so you write the reconciler logic, not the plumbing.


Core Concepts

Custom Resource Definition (CRD): Extends the Kubernetes API with a new resource type. kubectl get databases works after installing the CRD.

Custom Resource (CR): An instance of the CRD. apiVersion: db.example.com/v1alpha1 kind: Database name: payments-db is a CR.

Controller: A goroutine that watches for CRs and reconciles the actual cluster state to match the desired state declared in the CR. The reconciler is called on every create/update/delete and on a resync interval.

Reconciliation loop:

Observe: Get the current state of the CR and related resources
Diff: Compare desired state (spec) to actual state
Act: Create/update/delete resources to move toward desired state
Report: Update CR status to reflect current state

The reconciler must be idempotent — calling it multiple times with the same inputs must produce the same result. The API server may call it redundantly, and bugs that prevent idempotency cause thrashing.


Setup

bash
1# Install kubebuilder
2curl -L -o kubebuilder "https://go.kubebuilder.io/dl/latest/$(go env GOOS)/$(go env GOARCH)"
3chmod +x kubebuilder && mv kubebuilder /usr/local/bin/
4
5# Initialize a new operator project
6mkdir database-operator && cd database-operator
7kubebuilder init --domain example.com --repo github.com/my-org/database-operator
8
9# Create an API (generates CRD types + controller scaffold)
10kubebuilder create api --group db --version v1alpha1 --kind Database
11# Create Resource? Y
12# Create Controller? Y

This generates:

  • api/v1alpha1/database_types.go — CRD type definitions
  • internal/controller/database_controller.go — reconciler scaffold
  • config/crd/ — CRD manifest (generated from Go markers)
  • config/rbac/ — RBAC manifests (generated from controller markers)

Defining the CRD

go
1// api/v1alpha1/database_types.go
2
3// DatabaseSpec defines the desired state of Database
4type DatabaseSpec struct {
5    // +kubebuilder:validation:Minimum=1
6    // +kubebuilder:validation:Maximum=7
7    Replicas int32 `json:"replicas"`
8
9    // +kubebuilder:validation:Enum=postgres;mysql;redis
10    Engine string `json:"engine"`
11
12    // +kubebuilder:validation:Pattern=`^\d+Gi$`
13    StorageSize string `json:"storageSize"`
14
15    // +optional
16    Version string `json:"version,omitempty"`
17}
18
19// DatabaseStatus defines the observed state of Database
20type DatabaseStatus struct {
21    // +optional
22    Phase string `json:"phase,omitempty"`    // Creating, Running, Degraded, Failed
23
24    // +optional
25    ReadyReplicas int32 `json:"readyReplicas,omitempty"`
26
27    // +optional
28    // +listType=map
29    // +listMapKey=type
30    Conditions []metav1.Condition `json:"conditions,omitempty"`
31}
32
33// +kubebuilder:object:root=true
34// +kubebuilder:subresource:status
35// +kubebuilder:printcolumn:name="Engine",type="string",JSONPath=".spec.engine"
36// +kubebuilder:printcolumn:name="Replicas",type="integer",JSONPath=".spec.replicas"
37// +kubebuilder:printcolumn:name="Phase",type="string",JSONPath=".status.phase"
38// +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp"
39
40// Database is the Schema for the databases API
41type Database struct {
42    metav1.TypeMeta   `json:",inline"`
43    metav1.ObjectMeta `json:"metadata,omitempty"`
44
45    Spec   DatabaseSpec   `json:"spec,omitempty"`
46    Status DatabaseStatus `json:"status,omitempty"`
47}

Kubebuilder markers (// +kubebuilder:...) control CRD generation:

  • +kubebuilder:subresource:status — enables the /status subresource, allowing status updates without triggering the main reconcile watch
  • +kubebuilder:validation:Minimum=1 — adds OpenAPI validation to the CRD
  • +kubebuilder:printcolumn — adds columns to kubectl get database output

The Reconciler

go
1// internal/controller/database_controller.go
2
3type DatabaseReconciler struct {
4    client.Client
5    Scheme *runtime.Scheme
6}
7
8// +kubebuilder:rbac:groups=db.example.com,resources=databases,verbs=get;list;watch;create;update;patch;delete
9// +kubebuilder:rbac:groups=db.example.com,resources=databases/status,verbs=get;update;patch
10// +kubebuilder:rbac:groups=apps,resources=statefulsets,verbs=get;list;watch;create;update;patch;delete
11// +kubebuilder:rbac:groups=core,resources=services,verbs=get;list;watch;create;update;patch;delete
12// +kubebuilder:rbac:groups=core,resources=persistentvolumeclaims,verbs=get;list;watch
13
14func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
15    log := log.FromContext(ctx)
16
17    // 1. Fetch the Database CR
18    db := &dbv1alpha1.Database{}
19    if err := r.Get(ctx, req.NamespacedName, db); err != nil {
20        if apierrors.IsNotFound(err) {
21            // CR was deleted — nothing to do (owned resources are garbage collected)
22            return ctrl.Result{}, nil
23        }
24        return ctrl.Result{}, err
25    }
26
27    // 2. Set owner reference on all created resources for garbage collection
28    // (kubebuilder's controllerutil.SetControllerReference handles this)
29
30    // 3. Reconcile the StatefulSet
31    result, err := r.reconcileStatefulSet(ctx, db)
32    if err != nil {
33        log.Error(err, "Failed to reconcile StatefulSet")
34        return result, err
35    }
36
37    // 4. Update status
38    db.Status.Phase = "Running"
39    if err := r.Status().Update(ctx, db); err != nil {
40        return ctrl.Result{}, err
41    }
42
43    // 5. Requeue after 5 minutes for drift detection
44    return ctrl.Result{RequeueAfter: 5 * time.Minute}, nil
45}
46
47func (r *DatabaseReconciler) reconcileStatefulSet(ctx context.Context, db *dbv1alpha1.Database) (ctrl.Result, error) {
48    desired := r.buildStatefulSet(db)
49
50    // Set controller reference (ensures StatefulSet is garbage collected when Database is deleted)
51    if err := controllerutil.SetControllerReference(db, desired, r.Scheme); err != nil {
52        return ctrl.Result{}, err
53    }
54
55    // Check if StatefulSet exists
56    existing := &appsv1.StatefulSet{}
57    err := r.Get(ctx, client.ObjectKeyFromObject(desired), existing)
58    if apierrors.IsNotFound(err) {
59        // Create
60        return ctrl.Result{}, r.Create(ctx, desired)
61    }
62    if err != nil {
63        return ctrl.Result{}, err
64    }
65
66    // Update if spec changed (compare relevant fields)
67    if *existing.Spec.Replicas != *desired.Spec.Replicas {
68        existing.Spec.Replicas = desired.Spec.Replicas
69        return ctrl.Result{}, r.Update(ctx, existing)
70    }
71
72    return ctrl.Result{}, nil
73}

Finalizers for Cleanup

When the CR is deleted, Kubernetes garbage-collects owned resources (via owner references). For external resources (RDS instances, Route53 records, S3 buckets), use a finalizer:

go
1const databaseFinalizer = "db.example.com/finalizer"
2
3func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
4    db := &dbv1alpha1.Database{}
5    if err := r.Get(ctx, req.NamespacedName, db); err != nil {
6        return ctrl.Result{}, client.IgnoreNotFound(err)
7    }
8
9    // Handle deletion
10    if !db.DeletionTimestamp.IsZero() {
11        if controllerutil.ContainsFinalizer(db, databaseFinalizer) {
12            // Perform cleanup (delete external resources)
13            if err := r.deleteExternalResources(ctx, db); err != nil {
14                return ctrl.Result{}, err
15            }
16            controllerutil.RemoveFinalizer(db, databaseFinalizer)
17            return ctrl.Result{}, r.Update(ctx, db)
18        }
19        return ctrl.Result{}, nil
20    }
21
22    // Add finalizer on creation
23    if !controllerutil.ContainsFinalizer(db, databaseFinalizer) {
24        controllerutil.AddFinalizer(db, databaseFinalizer)
25        return ctrl.Result{}, r.Update(ctx, db)
26    }
27
28    // ... rest of reconciliation
29}

Testing with envtest

Kubebuilder includes envtest — a real Kubernetes API server (without kubelet) for integration testing:

go
1// internal/controller/suite_test.go — kubebuilder v4 / Ginkgo v2 pattern
2var (
3    testEnv   *envtest.Environment
4    k8sClient client.Client
5    ctx       context.Context
6    cancel    context.CancelFunc
7)
8
9var _ = BeforeSuite(func() {
10    testEnv = &envtest.Environment{
11        CRDDirectoryPaths: []string{filepath.Join("..", "..", "config", "crd", "bases")},
12    }
13    cfg, err := testEnv.Start()
14    Expect(err).NotTo(HaveOccurred())
15
16    k8sClient, err = client.New(cfg, client.Options{Scheme: scheme})
17    Expect(err).NotTo(HaveOccurred())
18
19    ctx, cancel = context.WithCancel(context.Background())
20    go func() {
21        defer GinkgoRecover()
22        mgr, err := ctrl.NewManager(cfg, ctrl.Options{Scheme: scheme})
23        Expect(err).NotTo(HaveOccurred())
24        Expect((&DatabaseReconciler{
25            Client: mgr.GetClient(),
26            Scheme: mgr.GetScheme(),
27        }).SetupWithManager(mgr)).To(Succeed())
28        Expect(mgr.Start(ctx)).To(Succeed())
29    }()
30})
31
32var _ = AfterSuite(func() {
33    cancel()
34    Expect(testEnv.Stop()).To(Succeed())
35})
go
1// internal/controller/database_controller_test.go
2It("should create a StatefulSet for a Database CR", func() {
3    db := &dbv1alpha1.Database{
4        ObjectMeta: metav1.ObjectMeta{Name: "test-db", Namespace: "default"},
5        Spec: dbv1alpha1.DatabaseSpec{
6            Engine: "postgres", Replicas: 3, StorageSize: "10Gi",
7        },
8    }
9    Expect(k8sClient.Create(ctx, db)).To(Succeed())
10
11    sts := &appsv1.StatefulSet{}
12    Eventually(func() error {
13        return k8sClient.Get(ctx, types.NamespacedName{Name: "test-db", Namespace: "default"}, sts)
14    }, timeout, interval).Should(Succeed())
15
16    Expect(*sts.Spec.Replicas).To(Equal(int32(3)))
17})

Frequently Asked Questions

When should I build an operator vs use Helm?

Helm is the right tool for packaging and deploying software that doesn't need lifecycle management. An operator is appropriate when you need to encode operational knowledge that can't be expressed declaratively: failure recovery (re-elect primary, restore from backup), upgrade orchestration (upgrade followers before leader), or managing external resources (provision RDS alongside the application). If the operational complexity can be handled by a human following a runbook once a month, Helm is simpler. If it needs to happen automatically on any failure, build an operator.

How do I handle rate limits and back-off in the reconciler?

The controller-runtime framework handles exponential back-off automatically when your reconciler returns an error. For rate limiting external API calls (cloud provider APIs), use controller-runtime's RateLimiter when registering the controller:

go
ctrl.NewControllerManagedBy(mgr).
    For(&dbv1alpha1.Database{}).
    WithOptions(controller.Options{
        RateLimiter: workqueue.NewTypedItemExponentialFailureRateLimiter[reconcile.Request](5*time.Second, 5*time.Minute),
    }).
    Complete(r)

For production-grade controller-runtime patterns (leader election, Server-Side Apply, cache scoping, and reconciliation performance), see Kubernetes Operators: Building Controllers with controller-runtime. For Crossplane as an alternative to custom operators for cloud infrastructure provisioning, see Crossplane: Cloud Infrastructure as Kubernetes Resources. For OPA Gatekeeper and Kyverno operators that validate custom resources from operators like this one, see Kubernetes Admission Webhooks: OPA Gatekeeper and Kyverno.

Building a custom Kubernetes operator for your platform? Talk to us at Coding Protocols — we help platform teams design operator architectures that encode operational knowledge without becoming maintenance burdens.

Related Topics

Kubernetes
Operators
Kubebuilder
CRD
Go
Platform Engineering
CNCF
Controller

Read Next