Security
16 min readMay 1, 2026

Kubernetes Security Hardening: A Production Checklist

Default Kubernetes settings are not secure defaults — they're permissive defaults designed to get clusters running quickly. Hardening for production means working through the API server, node, workload, network, and supply chain layers systematically. Here's the checklist.

AJ
Ajeet Yadav
Platform & Cloud Engineer
Kubernetes Security Hardening: A Production Checklist

Default Kubernetes is permissive by design. The defaults get you to a running cluster quickly, but they leave the API server wide open to unauthenticated access from within the cluster, pods with root privileges, and services reachable from anywhere. Hardening is the process of closing those defaults systematically.

This post works through the full hardening stack: API server security, node hardening, workload security (PSA), network isolation, RBAC, secrets management, and supply chain controls. Each section includes the audit commands to check your current state and the remediation for common gaps.


Threat Model

Before hardening, understand what you're protecting against:

External attacker (internet): Kubernetes API server and etcd should never be internet-accessible. Cloud load balancers in front of the API server with IP allowlisting are the minimum for managed clusters.

Compromised workload: A pod running malicious code (via supply chain attack, RCE in the application) should be contained — it shouldn't be able to reach other pods, read secrets it doesn't own, or escalate to cluster-admin.

Insider threat / misconfigured access: An overly permissive RBAC binding should not give a developer cluster-admin access. Namespace boundaries should be enforced.

Supply chain attack: A malicious container image or CI/CD pipeline modification shouldn't be deployable to production.

Work through hardening in order: workload isolation first (highest risk reduction), then network, then RBAC, then supply chain.


1. API Server Hardening

Audit Logging

Enable comprehensive audit logging to track who did what:

yaml
1# /etc/kubernetes/audit-policy.yaml
2apiVersion: audit.k8s.io/v1
3kind: Policy
4rules:
5  # Log everything at Request level for sensitive resources
6  - level: Request
7    resources:
8      - group: ""
9        resources: ["secrets", "configmaps", "serviceaccounts"]
10  # Log metadata for read operations on most resources
11  - level: Metadata
12    resources:
13      - group: ""
14        resources: ["pods", "services", "deployments"]
15  # Ignore health checks
16  - level: None
17    users: ["system:kube-controller-manager", "system:kube-scheduler"]
18    verbs: ["get", "list", "watch"]
19    resources:
20      - group: ""
21        resources: ["endpoints", "services"]

For managed clusters (EKS, GKE, AKS), enable audit logs via the platform:

bash
# EKS: enable audit logs
aws eks update-cluster-config \
  --name my-cluster \
  --logging '{"clusterLogging":[{"types":["api","audit","authenticator","controllerManager","scheduler"],"enabled":true}]}'

Disable Anonymous Authentication

On self-managed clusters, verify anonymous auth is disabled:

bash
# Check current setting
kubectl get cm kube-apiserver-config -n kube-system -o yaml | grep anonymous
# Should NOT see: --anonymous-auth=true

On managed clusters, anonymous auth to the Kubernetes API is disabled by default.

Restrict API Server Access

bash
1# For EKS: restrict API server to specific IP ranges
2aws eks update-cluster-config \
3  --name my-cluster \
4  --resources-vpc-config \
5    endpointPublicAccess=true,\
6    endpointPrivateAccess=true,\
7    publicAccessCidrs=["203.0.113.0/32"]   # Only your IPs

The ideal: endpointPublicAccess=false with endpointPrivateAccess=true — API access only from within the VPC, via a VPN or bastion.


2. Node Hardening

Use Minimal OS Images

Use purpose-built container OS images (Amazon Linux 2023, Bottlerocket, COS on GKE) rather than general-purpose Linux. These images:

  • Have no package manager (can't apt install tools after compromise)
  • Run a minimal surface area
  • Enable seccomp by default
  • Have immutable root filesystems (Bottlerocket)

Bottlerocket for EKS:

hcl
1# Terraform EKS module
2eks_managed_node_groups = {
3  default = {
4    ami_type = "BOTTLEROCKET_x86_64"
5    # Bottlerocket has no SSH by default — access via SSM or admin container
6  }
7}

Node Security Groups

Restrict node-to-node traffic to only what Kubernetes needs:

  • Kubelet port (10250) from the API server IP range only
  • Container runtime socket never exposed
  • No SSH open to the internet (use SSM or a bastion)

IMDSv2 Enforcement

On AWS, enforce IMDSv2 (token-required metadata service) to prevent SSRF attacks from reaching the instance metadata:

hcl
# In Terraform node group launch template
metadata_options {
  http_endpoint               = "enabled"
  http_tokens                 = "required"   # Enforces IMDSv2
  http_put_response_hop_limit = 1            # Prevents pod-level SSRF to metadata
}

With http_put_response_hop_limit = 1, HTTP requests from pods (TTL decremented at the node bridge) cannot reach the instance metadata endpoint — only host-level processes can.


3. Pod Security Admission

PSA enforces security profiles at the namespace level. Production namespaces should run restricted:

bash
1# Audit all namespaces — see what would fail restricted
2kubectl label namespace --list | xargs -I {} \
3  kubectl label namespace {} \
4  pod-security.kubernetes.io/audit=restricted \
5  pod-security.kubernetes.io/audit-version=latest \
6  --overwrite 2>/dev/null
7
8# Check audit events
9kubectl get events -A | grep "policy/audit"

Enforce profiles per namespace:

bash
1# Production: strict
2kubectl label namespace production \
3  pod-security.kubernetes.io/enforce=restricted \
4  pod-security.kubernetes.io/enforce-version=latest \
5  pod-security.kubernetes.io/warn=restricted
6
7# Dev/staging: baseline
8kubectl label namespace staging \
9  pod-security.kubernetes.io/enforce=baseline \
10  pod-security.kubernetes.io/warn=restricted
11
12# Platform/system: must stay privileged for DaemonSets with host access
13kubectl label namespace kube-system \
14  pod-security.kubernetes.io/enforce=privileged

restricted profile requirements (your pods must satisfy all):

yaml
1spec:
2  securityContext:
3    runAsNonRoot: true
4    seccompProfile:
5      type: RuntimeDefault   # or Localhost
6  containers:
7    - securityContext:
8        allowPrivilegeEscalation: false
9        capabilities:
10          drop: ["ALL"]
11        readOnlyRootFilesystem: true    # Recommended but not required by restricted
12  volumes:
13    # Allowed: configMap, emptyDir, secret, downwardAPI, projected, csi, persistentVolumeClaim
14    # Blocked: hostPath, hostNetwork, hostPID, hostIPC

For workloads that can't immediately meet restricted (legacy apps running as root), use baseline as a stepping stone — it blocks the most dangerous configurations (privileged containers, hostPath, host namespaces) without requiring non-root.


4. RBAC Hardening

Audit for Over-Permissioned Bindings

bash
1# Find all ClusterRoleBindings that grant cluster-admin
2kubectl get clusterrolebindings -o json | \
3  jq '.items[] | select(.roleRef.name == "cluster-admin") | 
4  {name: .metadata.name, subjects: .subjects}'
5
6# Find ClusterRoleBindings with edit or admin rights (non-platform subjects)
7kubectl get clusterrolebindings -o json | \
8  jq '.items[] | select(.roleRef.name == "edit" or .roleRef.name == "admin") | 
9  {name: .metadata.name, subjects: .subjects}'

Investigate any cluster-admin binding not attached to the platform team or system components. Every cluster-admin binding is a blast radius that extends to every resource in the cluster.

Disable Default Service Account Token Automount

Default service accounts in every namespace automatically get a token mounted into every pod. This token has the permissions of the default service account — often minimal, but it's an attack surface that shouldn't exist by default:

bash
# Patch the default SA in each namespace to not automount tokens
kubectl patch serviceaccount default -n production \
  -p '{"automountServiceAccountToken": false}'

# Verify
kubectl get serviceaccount default -n production -o jsonpath='{.automountServiceAccountToken}'

For pods that need API access, create dedicated service accounts with minimal permissions:

yaml
1apiVersion: v1
2kind: ServiceAccount
3metadata:
4  name: my-app
5  namespace: production
6automountServiceAccountToken: false   # Set at SA level
7
8# Override per-pod only when needed:
9# spec.automountServiceAccountToken: true

Least-Privilege Service Account Pattern

yaml
1# Create a dedicated SA
2apiVersion: v1
3kind: ServiceAccount
4metadata:
5  name: metrics-reader
6  namespace: production
7---
8# Create a minimal Role
9apiVersion: rbac.authorization.k8s.io/v1
10kind: Role
11metadata:
12  name: metrics-reader
13  namespace: production
14rules:
15  - apiGroups: [""]
16    resources: ["pods", "nodes"]
17    verbs: ["get", "list"]
18---
19# Bind them
20apiVersion: rbac.authorization.k8s.io/v1
21kind: RoleBinding
22metadata:
23  name: metrics-reader
24  namespace: production
25subjects:
26  - kind: ServiceAccount
27    name: metrics-reader
28    namespace: production
29roleRef:
30  kind: Role
31  name: metrics-reader
32  apiGroup: rbac.authorization.k8s.io

Audit RBAC with Tools

bash
1# kubectl who-can: who can perform an action?
2# Installed via `kubectl krew install who-can`. Alternative: standalone `kubectl-who-can` binary.
3kubectl who-can create pods -n production
4
5# rakkess: access matrix for a user/SA
6kubectl rakkess --sa production:my-app
7
8# rbac-lookup: what can a subject do?
9kubectl rbac-lookup my-app -n production

5. Network Isolation

Default-Deny NetworkPolicy

Apply to every namespace:

yaml
1apiVersion: networking.k8s.io/v1
2kind: NetworkPolicy
3metadata:
4  name: default-deny-all
5  namespace: production
6spec:
7  podSelector: {}
8  policyTypes:
9    - Ingress
10    - Egress
11  egress:
12    - to:
13        - namespaceSelector:
14            matchLabels:
15              kubernetes.io/metadata.name: kube-system
16          podSelector:
17            matchLabels:
18              k8s-app: kube-dns
19      ports:
20        - protocol: UDP
21          port: 53
22        - protocol: TCP
23          port: 53

Automate this for all new namespaces using Kyverno generate rules so it's applied automatically on namespace creation.

Egress to External IPs

If pods should not reach arbitrary internet destinations, add CIDR-based egress policies:

yaml
1# Allow only specific external APIs
2spec:
3  egress:
4    - to:
5        - ipBlock:
6            cidr: 10.0.0.0/8   # Internal VPC
7    - to:
8        - ipBlock:
9            cidr: 0.0.0.0/0
10            except:
11              - 10.0.0.0/8     # Block internal, allow internet?
12              # Be explicit about what external IPs you allow

For zero-trust egress, combine NetworkPolicy with a proxy (Squid, mitmproxy, or a cloud NAT gateway with allowlisting).


6. Secrets Management

Never Use ConfigMaps for Secrets

Kubernetes Secrets provide base64 encoding (not encryption) at the API level, but they're stored encrypted at rest in etcd (if encryption is configured) and have RBAC access control separating them from ConfigMaps.

bash
1# Check if encryption at rest is configured
2kubectl get secrets -A | wc -l  # exists, but are they encrypted on disk?
3
4# For EKS: enable envelope encryption
5aws eks associate-encryption-config \
6  --cluster-name my-cluster \
7  --encryption-config '[{"resources":["secrets"],"provider":{"keyArn":"arn:aws:kms:..."}}]'

Use External Secrets Operator

ESO pulls secrets from Vault, AWS SSM/Secrets Manager, or GCP Secret Manager and creates Kubernetes Secrets automatically:

yaml
1apiVersion: external-secrets.io/v1beta1
2kind: ExternalSecret
3metadata:
4  name: database-credentials
5  namespace: production
6spec:
7  refreshInterval: 1h
8  secretStoreRef:
9    name: aws-secretsmanager
10    kind: ClusterSecretStore
11  target:
12    name: database-credentials
13    creationPolicy: Owner
14  data:
15    - secretKey: DB_PASSWORD
16      remoteRef:
17        key: production/postgres/password
18        property: password

ESO keeps the Kubernetes Secret in sync with the external store. When you rotate the secret in SSM, ESO updates the Kubernetes Secret within the refreshInterval.

See Secrets Management in Kubernetes: Vault vs External Secrets Operator for the full comparison.


7. Supply Chain Security

Image Signing and Verification

Sign images with cosign and enforce signature verification in production:

bash
# Sign image after build
cosign sign --key cosign.key gcr.io/my-project/myapp:v1.0.0

# Kyverno policy: require signed images in production
yaml
1apiVersion: kyverno.io/v1
2kind: ClusterPolicy
3metadata:
4  name: require-signed-images
5spec:
6  validationFailureAction: Enforce
7  rules:
8    - name: check-image-signature
9      match:
10        any:
11          - resources:
12              kinds: ["Pod"]
13              namespaces: ["production"]
14      verifyImages:
15        - imageReferences: ["gcr.io/my-project/*"]
16          attestors:
17            - count: 1
18              entries:
19                - keys:
20                    publicKeys: |-
21                      -----BEGIN PUBLIC KEY-----
22                      ...
23                      -----END PUBLIC KEY-----

For CI/CD pipelines, consider keyless signing via Sigstore's Fulcio CA and Rekor transparency log — no private key to manage, identity is bound to the OIDC token of the pipeline runner:

bash
# Keyless signing (GitHub Actions, GitLab CI, etc.)
cosign sign --yes $IMAGE_DIGEST
# Verification
cosign verify --certificate-identity-regexp="https://github.com/your-org/your-repo" \
  --certificate-oidc-issuer="https://token.actions.githubusercontent.com" \
  $IMAGE_DIGEST

Keyless signing is the recommended approach for automated pipelines in 2026 — the signing identity is cryptographically tied to the pipeline's OIDC token rather than a long-lived private key.

SBOM and Vulnerability Scanning

bash
1# Generate SBOM
2syft gcr.io/my-project/myapp:v1.0.0 -o spdx-json > sbom.json
3
4# Scan for CVEs
5trivy image gcr.io/my-project/myapp:v1.0.0 \
6  --severity CRITICAL,HIGH \
7  --exit-code 1   # Fail pipeline on critical CVEs

Integrate scanning into CI/CD as a gate before push to production registry.

Admission Control with Kyverno

Kyverno policies enforce security requirements at admission time — before workloads are created:

yaml
1# Require non-root containers
2apiVersion: kyverno.io/v1
3kind: ClusterPolicy
4metadata:
5  name: require-non-root
6spec:
7  validationFailureAction: Enforce
8  rules:
9    - name: check-run-as-non-root
10      match:
11        any:
12          - resources:
13              kinds: ["Pod"]
14              namespaces: ["production"]
15      validate:
16        message: "Containers must run as non-root"
17        pattern:
18          spec:
19            =(securityContext):
20              =(runAsNonRoot): true
21            containers:
22              - =(securityContext):
23                  =(runAsNonRoot): true
24
25# Require resource requests on all containers
26---
27apiVersion: kyverno.io/v1
28kind: ClusterPolicy
29metadata:
30  name: require-resource-requests
31spec:
32  validationFailureAction: Enforce
33  rules:
34    - name: check-cpu-request
35      match:
36        any:
37          - resources:
38              kinds: ["Pod"]
39      validate:
40        message: "CPU and memory requests are required"
41        pattern:
42          spec:
43            containers:
44              - resources:
45                  requests:
46                    cpu: "?*"
47                    memory: "?*"

8. CIS Benchmark Scanning

Run the CIS Kubernetes Benchmark against your cluster to get a scored baseline:

bash
# kube-bench: runs CIS benchmark checks locally
kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml
kubectl logs job/kube-bench

# Or run directly (requires node access)
kube-bench run --targets node,master

kube-bench output scores each CIS control as PASS, FAIL, or WARN with remediation instructions. Prioritise FAIL items in the master and etcd sections — these are the highest-severity controls.

For managed clusters (EKS, GKE, AKS), many control plane controls are FAIL because the platform doesn't expose the relevant flags. This is expected — managed clusters have different security models. Focus on the node and policies sections for actionable findings.


Security Hardening Priority Order

If you're starting from scratch, work in this order for maximum risk reduction per unit of effort:

  1. PSA on production namespaces (restricted or baseline) — prevents the most common container escape vectors
  2. Default-deny NetworkPolicy on all namespaces — contains blast radius of a compromised pod
  3. Disable default SA token automount — removes unnecessary API access from all pods
  4. RBAC audit — identify and revoke cluster-admin bindings not owned by platform team
  5. IMDSv2 enforcement (AWS) — blocks pod-level credential theft via SSRF
  6. Image signing + Kyverno verification — prevents tampered images from reaching production
  7. ESO for secrets — removes plaintext secrets from ConfigMaps and git history
  8. Audit logging to SIEM — enables post-incident forensics

Frequently Asked Questions

What's the difference between PSA and Kyverno for pod security?

PSA (Pod Security Admission) is built into Kubernetes and enforces the three built-in profiles (privileged/baseline/restricted). Kyverno is a policy engine that can enforce arbitrary policies including PSA-equivalent rules plus custom ones (require labels, block specific images, require resource requests). Use both: PSA for the standard baseline, Kyverno for organisation-specific policies.

Is Falco necessary for security?

Falco provides runtime threat detection — it alerts when a pod unexpectedly opens a network connection, reads /etc/passwd, or runs bash. PSA and NetworkPolicy are preventive; Falco is detective. For organisations with a security operations team that can respond to Falco alerts, it adds real value. For teams without dedicated security response capacity, the alert noise may outweigh the benefit.

How do I handle privileged containers that my workload legitimately needs?

Some workloads (CNI plugins, CSI drivers, logging agents, monitoring agents) legitimately need elevated privileges. Use namespace-scoped PSA exemptions for the kube-system and platform namespaces where these run (enforce=privileged), and enforce restricted strictly on application namespaces. Don't relax application namespace security for platform DaemonSets.

What's the fastest win if I have zero hardening right now?

Apply default-deny NetworkPolicy to production namespaces and label them with PSA baseline enforcement. These two changes take under an hour, require no application code changes (baseline is permissive enough for most apps), and dramatically reduce the blast radius of a compromised pod.


For RBAC patterns and tooling, see Kubernetes RBAC in Practice. For PSA migration from PodSecurityPolicy, see Kubernetes Pod Security Admission: The PodSecurityPolicy Replacement Guide. For supply chain security, see Supply Chain Security Tools for Kubernetes. For container supply chain security covering image signing, Cosign, and Kyverno admission enforcement, see Container Image Security: Supply Chain from Build to Production. For NetworkPolicy patterns that implement the network security layer, see Kubernetes NetworkPolicy: Zero-Trust Networking for Multi-Team Clusters.

Hardening a Kubernetes cluster for a security audit or compliance requirement? Talk to us at Coding Protocols — we help platform teams build security baselines that satisfy auditors without blocking developers.

Related Topics

Kubernetes
Security
Hardening
CIS Benchmark
RBAC
Pod Security
Network Policy
DevSecOps
Platform Engineering

Read Next