Kubernetes
13 min readMay 1, 2026

Cilium Mutual Authentication: mTLS Without a Service Mesh

Cilium 1.14 introduced native mutual TLS authentication using SPIFFE/SPIRE as the identity provider — cryptographic proof that both the client and server are who they claim to be, enforced at the eBPF layer without sidecar proxies. This is different from WireGuard encryption, which encrypts node-to-node tunnels but doesn't provide per-service workload identity.

AJ
Ajeet Yadav
Platform & Cloud Engineer
Cilium Mutual Authentication: mTLS Without a Service Mesh

Most teams who want mutual TLS between services reach for Istio or Linkerd because Kubernetes has no native mTLS story. The CNI just moves packets. If you want cryptographic proof that the caller is who it claims to be, you need something that can issue identities, terminate TLS, and enforce that both sides verified. Service meshes do that — at the cost of a sidecar proxy on every pod, ~50MB of memory overhead per workload, L7 proxy latency on every request, and a whole new control plane to operate.

Cilium 1.14 introduced a different answer: mutual authentication enforced at the eBPF layer, using SPIFFE/SPIRE as the identity provider, with zero sidecar proxies. But "mTLS without sidecars" is a statement that needs unpacking carefully, because Cilium actually has two distinct features that are both described as "encryption" or "security," and they solve entirely different problems.

The first is WireGuard node-to-node encryption (encryption.type: wireguard). The second is mutual authentication (authentication.mode: required in a CiliumNetworkPolicy). They are not the same thing. Running WireGuard gives you no per-service identity assertions. Running mutual auth gives you identity verification but not confidentiality by default. Most production deployments want both, but they are configured independently, and conflating them leads to security gaps you won't discover until an audit or an incident.

This post is about mutual auth — the SPIFFE-based workload identity piece — how to deploy it, how it works at the packet level, and where it fits relative to Istio.


WireGuard vs. Mutual Auth: The Conceptual Distinction

Both features show up in Cilium's security documentation, and both get described as "encrypting traffic between pods." They are not interchangeable.

WireGuard (encryption.type: wireguard in Cilium Helm values) encrypts all IP traffic between nodes at the Linux network layer, below the pod network stack. From the pod's perspective, its traffic is plaintext — the pod sends a normal TCP packet, and the Cilium WireGuard device on the node encrypts it before it leaves the node's network interface. On the receiving node, the WireGuard device decrypts it before delivering it to the destination pod.

The critical limitation: WireGuard provides no workload identity. It encrypts the tunnel, but any pod on node A can talk to any pod on node B through that encrypted tunnel. If you have a compromised pod on node A, its traffic to node B is encrypted — but so is legitimate traffic. The encryption doesn't help you distinguish the legitimate workload from the compromised one.

Mutual Auth (authentication.mode: required in a CiliumNetworkPolicy) works at a completely different layer. Before a connection is allowed to proceed, both the client and server must present a valid SPIFFE identity — an X.509 certificate signed by the SPIRE Server that cryptographically proves which pod (specifically: which Kubernetes service account, in which namespace, in which cluster) is making the request. The connection is allowed only if both identities are valid and the requesting identity is permitted by policy.

This is the zero-trust model in the strict sense: "I know exactly who you are and you know exactly who I am, and the network policy says this connection is permitted between us." It's enforced per-connection, not per-node.

You can run WireGuard and mutual auth simultaneously — WireGuard for encryption confidentiality at the node level, mutual auth for identity verification at the connection level. They're complementary. But each is configured and verified separately.


SPIFFE and SPIRE: Just Enough Background

SPIFFE (Secure Production Identity Framework For Everyone) is an open standard for workload identity. Every workload that participates in SPIFFE gets a SPIFFE ID — a URI of the form:

spiffe://trust-domain/ns/namespace/sa/service-account

So an orders-service pod running in the payments namespace using the orders-sa service account gets the SPIFFE ID:

spiffe://cluster.local/ns/payments/sa/orders-sa

This identity is encoded in an X.509 certificate called an SVID (SPIFFE Verifiable Identity Document). SVIDs are short-lived (default TTL: 1 hour) and automatically rotated.

SPIRE (the SPIFFE Runtime Environment) is the reference implementation of SPIFFE. It has two components:

  • SPIRE Server: the CA that issues and signs SVIDs. It stores the root CA private key and is the trust root for the entire cluster. Must be highly available.
  • SPIRE Agent: a DaemonSet that runs on every node. It attests workloads — it talks to the Kubernetes API to verify that a requesting pod actually has the service account it claims — and then fetches an SVID from the SPIRE Server on the workload's behalf.

Cilium uses SPIRE as its identity backend. The Cilium agent on each node talks to the local SPIRE Agent socket to get SVIDs, and uses those SVIDs to authenticate connections between workloads according to CiliumNetworkPolicy rules.


Prerequisites and Version Requirements

Cilium mutual auth has a specific set of requirements that aren't all obvious from the documentation:

  • Cilium 1.14+: mutual auth was introduced as an experimental feature in 1.14. It reached stable status in Cilium 1.15, which is the minimum version I'd recommend for production use. As of Cilium 1.16+, the SPIRE integration is in the stable channel.
  • SPIRE Server and SPIRE Agent deployed in the cluster. You can let Cilium install SPIRE for you (authentication.mutual.spire.install.enabled: true), but I prefer managing SPIRE independently for operational flexibility.
  • kube-proxy replacement mode: Cilium must be running in kubeProxyReplacement: true mode (or the equivalent eBPF-based mode). The mutual auth feature requires Cilium's full eBPF datapath and does not work when running alongside kube-proxy.
  • Exclusive CNI: mutual auth is not supported when Cilium is chained with Flannel or Calico. Cilium must be the sole CNI plugin.
  • Node-to-node connectivity: SPIRE Agents on each node must be able to reach the SPIRE Server. If you have aggressive NetworkPolicies or firewall rules blocking intra-cluster traffic, check port 8081 (SPIRE Server gRPC).

Installing SPIRE

I use the official SPIFFE Helm chart from the hardened chart repository, which is maintained by the SPIFFE project and supports HA SPIRE Server deployments:

bash
1helm repo add spiffe https://spiffe.github.io/helm-charts-hardened/
2helm repo update
3
4helm install spire spiffe/spire \
5  --namespace spire-system \
6  --create-namespace \
7  --set global.spire.trustDomain="cluster.local" \
8  --set spire-server.replicaCount=3 \
9  --set spire-agent.enabled=true

The trustDomain must match what you configure in Cilium. cluster.local is conventional for single-cluster setups; for multi-cluster deployments, use a unique domain per cluster.

Verify the rollout:

bash
kubectl get pods -n spire-system

You should see 3 spire-server-* pods (StatefulSet) and one spire-agent-* pod per node (DaemonSet). If any SPIRE Agent pods are in Pending or CrashLoopBackOff, mutual auth will fail for pods on those nodes.

The SPIRE Server automatically registers a Kubernetes workload attestor. When a pod requests an SVID, the SPIRE Agent on its node calls the Kubernetes API to confirm that the pod exists, is running, and belongs to the service account encoded in the requested SPIFFE ID. This workload attestation is what makes the identity binding trustworthy — the SPIRE Agent won't issue an SVID for a service account that a pod doesn't actually hold.


Enabling Cilium Mutual Auth

With SPIRE running, upgrade Cilium to enable the mutual auth integration. Create a values file:

yaml
1authentication:
2  mutual:
3    spire:
4      enabled: true
5      install:
6        enabled: false
7      agentSocketPath: "/run/spire/sockets/agent.sock"
8      serverAddress: "spire-server.spire-system.svc:8081"
9      trustDomain: "cluster.local"
10      adminSocketPath: "/run/spire/sockets/admin.sock"

The agentSocketPath is the Unix socket that the SPIRE Agent exposes on each node. Cilium's agent process talks to this socket directly rather than making network calls. The serverAddress is used for Cilium to register its own workload entries with the SPIRE Server.

Apply the upgrade:

bash
helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --reuse-values \
  -f cilium-mutual-auth-values.yaml

This performs a rolling restart of the Cilium DaemonSet. After the rollout completes, verify the SPIRE integration is healthy:

bash
cilium status | grep -i spire

Expected output:

SPIRE integration:   OK

If you see SPIRE integration: Unavailable, the most common causes are: the SPIRE Agent socket path doesn't match, the socket isn't mounted into the Cilium DaemonSet pod, or the SPIRE Agent pod on that node is unhealthy.


CiliumNetworkPolicy with Authentication

Here is where mutual auth becomes actionable. The authentication stanza in a CiliumNetworkPolicy specifies that a particular traffic path requires SPIFFE identity verification before the connection is allowed.

A basic policy requiring mutual auth on ingress to the payments-api:

yaml
1apiVersion: cilium.io/v2
2kind: CiliumNetworkPolicy
3metadata:
4  name: payments-require-mtls
5  namespace: payments
6spec:
7  endpointSelector:
8    matchLabels:
9      app: payments-api
10  ingress:
11    - fromEndpoints:
12        - matchLabels:
13            app: orders-service
14      authentication:
15        mode: required

The authentication.mode: required field is the key addition. Without it, the policy allows traffic from orders-service to payments-api based on label matching alone — that's standard Cilium network policy behavior. With it, the connection is only permitted if both sides have valid SVIDs and those certificates verify against the SPIRE trust bundle. A pod with the right labels but no valid SVID (or an expired SVID) will have its connection dropped.

Before enforcing in production, use mode: test-always to audit what would break:

yaml
authentication:
  mode: test-always

test-always is an audit mode. Connections are allowed regardless of authentication status, but Cilium logs authentication failures. This lets you deploy the policy and observe what traffic paths would fail before you flip to required. Run test-always for at least one SVID rotation cycle (default: 1 hour) to see any rotation-related edge cases.

The valid modes are:

  • disabled — no authentication requirement (default if the authentication stanza is omitted)
  • required — connection dropped if SPIFFE handshake fails
  • test-always — connection allowed but authentication failures logged

How the Handshake Works

When a connection requires mutual auth, the process from TCP SYN to established connection goes through these steps:

  1. Pod A (orders-service) initiates a TCP connection to Pod B (payments-api).
  2. The outgoing SYN packet is intercepted by Cilium's eBPF program in the kernel.
  3. Cilium checks the policy map: does this source-destination pair require authentication?
  4. Since the policy says authentication.mode: required, Cilium does not forward the SYN immediately. Instead, it signals its userspace auth handler.
  5. Cilium's auth handler initiates an out-of-band SPIFFE handshake. It fetches Pod A's SVID from the local SPIRE Agent, establishes a mutually-authenticated TLS session with Cilium on the destination node, and they exchange and validate each other's SVIDs against the SPIRE trust bundle.
  6. If both SVIDs are valid, the certificates haven't expired, and the trust chain verifies, Cilium marks this connection pair as authenticated in its kernel policy map.
  7. The original SYN packet is now forwarded. The TCP connection proceeds normally.
  8. If the SPIFFE handshake fails at any step — invalid certificate, expired SVID, untrusted issuer — the SYN is dropped. The application never knows an attempt was made.

The entire SPIFFE handshake happens out-of-band and is transparent to the application. Pod A's application code calls connect(), and either the connection succeeds or it gets a timeout/refused error. There is no TLS termination at the pod — the application traffic itself remains at whatever protocol the application uses (plain TCP, HTTP, gRPC, whatever).

This is the fundamental architectural difference from a sidecar proxy: the authentication is enforced in the kernel via eBPF, not by an L7 proxy sitting between the application and the network.


SVID Rotation and Certificate Lifetime

SPIRE issues SVIDs with a configurable TTL. The default is 1 hour, which means every workload in your cluster is getting a new certificate every hour. This is intentional — short-lived certificates reduce the blast radius of a compromised credential.

The SPIRE Agent handles rotation automatically. Before an SVID expires, the Agent fetches a new one from the SPIRE Server and delivers it to all local workloads that requested it. Cilium's auth handler caches the current SVID and invalidates the cache when SPIRE rotates it, picking up the new certificate transparently.

Rotation is transparent to running workloads. Existing connections that were established while the old SVID was valid remain open — the SVID is only checked at connection establishment time, not continuously on established connections. New connections after rotation use the new SVID. No pod restarts are needed.

To inspect the current SVID for a node:

bash
kubectl exec -n spire-system -it daemonset/spire-agent -- \
  /opt/spire/bin/spire-agent api fetch x509 \
  -socketPath /run/spire/sockets/agent.sock

The output shows the SVID's SPIFFE ID, the expiry time, and the certificate chain. If you're seeing authentication failures you can't explain, checking SVID expiry here is the first diagnostic step.


Verifying Mutual Auth Is Working

The canonical verification path is through Hubble. With Hubble enabled, authenticated flows carry an auth_type field in the flow metadata:

bash
1hubble observe \
2  --namespace payments \
3  --follow \
4  --output json | \
5  jq 'select(.flow.auth_type != null) | {
6    src: .flow.source.pod_name,
7    dst: .flow.destination.pod_name,
8    auth: .flow.auth_type
9  }'

For connections that successfully authenticated, you should see:

json
{"src": "orders-service-6d9f7b-xxx", "dst": "payments-api-5c8d4f-xxx", "auth": "SPIFFE"}

Flows that lack the auth_type field either don't require mutual auth (no policy) or failed authentication and were dropped before appearing in the flow log.

For lower-level packet-by-packet visibility, cilium monitor shows policy verdicts including auth status:

bash
cilium monitor --type policy-verdict

Each line shows the source/destination endpoint IDs, the verdict (ALLOWED or DENIED), and for policies with authentication requirements, whether the authentication was AUTH_OK or AUTH_FAIL.

If connections from orders-service to payments-api are being dropped and you see AUTH_FAIL in the monitor output, the SVID exchange is failing. Check: SPIRE Agent pod health on both source and destination nodes, SVID expiry, and clock synchronization between nodes.


Comparison with Istio and Linkerd mTLS

Istio handles mTLS through sidecar Envoy proxies. Every pod in the mesh gets an Envoy sidecar injected. When Pod A communicates with Pod B, the traffic goes: Pod A application → Pod A Envoy (TLS origination) → network → Pod B Envoy (TLS termination) → Pod B application. Istio's PeerAuthentication policy controls whether mTLS is required for a given namespace or workload. The traffic between the pod and its own Envoy sidecar is plaintext over localhost — only the inter-pod traffic is TLS.

Advantages of Istio mTLS: full L7 observability (header inspection, request tracing, metrics by HTTP method/status), traffic management features (retries, circuit breaking, fault injection, canary routing by header). Resource overhead: ~50MB memory per Envoy sidecar, plus L7 proxy latency on every request (~0.5-2ms per hop).

Linkerd uses ultra-lightweight Rust-based sidecar proxies (called "proxies" not "sidecars," but functionally the same architecture). Lower memory overhead than Istio (~10-20MB per proxy), still L7-capable. Same fundamental architecture: proxy-per-pod.

Cilium mutual auth: no sidecars. Authentication enforced in the eBPF kernel layer. Zero proxy latency on established connections. No L7 visibility from mTLS itself — Hubble provides L4 flow visibility (source, destination, protocol, verdict) but not HTTP-level metrics. No traffic management features — Cilium is not trying to replace L7 proxies for retries or circuit breaking.

The practical decision:

  • Use Cilium mutual auth when you want zero-trust connection authentication without service mesh overhead, you're already running Cilium as your CNI, and you don't need L7 traffic management policies.
  • Use Istio or Linkerd when you need per-route traffic management (canary deployments, retries, circuit breaking), L7 observability by HTTP path/method, or you're not on Cilium.
  • Don't try to run both Istio and Cilium mutual auth on the same traffic paths. Two systems managing connection-level identity creates policy conflicts that are extremely difficult to reason about. Pick one approach per cluster.

Production Considerations

SPIRE Server availability. The SPIRE Server is your cluster's CA. If it's down, SVID rotation fails, and workloads with expiring SVIDs lose the ability to authenticate new connections. Run 3 replicas with persistent storage — the SPIRE Server's database contains the CA private key and must survive pod restarts. Use a StorageClass backed by replicated storage (EBS gp3 with multi-AZ node groups risks single-AZ storage loss; prefer EFS or a storage system with HA guarantees for the SPIRE Server StatefulSet).

SPIRE Agent as a hard dependency. The SPIRE Agent runs as a DaemonSet, and if it's unhealthy on a node, every pod on that node loses the ability to get or renew SVIDs. This is a hard availability dependency: a failed SPIRE Agent pod causes all new connections from pods on that node to fail mutual auth. Set up alerting on SPIRE Agent pod health separately from your general pod health monitoring. Treat a SPIRE Agent crash like a node-level incident.

Clock synchronization. SVID validation is time-sensitive. X.509 certificate validation rejects certs where the current time is outside the notBefore/notAfter window. If a node's clock drifts more than a few minutes relative to other nodes, SVID validation begins failing. Ensure chrony or NTP is configured and synchronized on every node. Clock skew greater than 5 minutes will cause systematic auth failures that look like SPIRE is broken.

Connection establishment latency. Mutual auth adds latency to the first packet of each new connection — the out-of-band SPIFFE handshake typically takes 1-3ms. Subsequent packets on an established connection have no added overhead. This matters for workloads that open many short-lived connections. If your service opens a new TCP connection per request (rather than connection pooling), you'll see that 1-3ms per request. For gRPC (which multiplexes over a long-lived connection) or HTTP/1.1 with keep-alive, the overhead is amortized and effectively invisible.

Rollout strategy. Use mode: test-always before mode: required. Deploy the policy in audit mode, observe for a full SVID rotation cycle (at least 1 hour, ideally a full day), and confirm that the flows you expect to succeed are authenticating correctly. Then switch to required. This prevents a policy misconfiguration from causing a production outage.


Frequently Asked Questions

Is Cilium mutual auth GA? The feature has been stable since Cilium 1.15. The SPIRE integration is in the stable channel as of Cilium 1.16. I would not run it in production on anything older than 1.15.

Does Cilium mutual auth encrypt traffic? No — authentication only. Mutual auth verifies identity; it does not encrypt the application payload. For confidentiality, add WireGuard encryption separately (encryption.type: wireguard in Helm values). The two features are independent and complementary.

Can I use Cilium mutual auth alongside Istio in the same cluster? Technically you can run both, but I wouldn't recommend it. Two systems asserting and enforcing workload identity on the same connections creates ambiguity about which system is authoritative. If you have services migrating from Istio to Cilium, keep them in separate namespaces with a clear cutover boundary, not concurrent enforcement on the same traffic paths.

Does this work with EKS? Yes. Cilium runs on EKS in either CNI chaining mode (alongside the AWS VPC CNI) or replacement mode. SPIRE runs entirely in-cluster and doesn't depend on any AWS-specific identity primitives. The SPIRE Server just needs persistent storage — an EBS volume or EFS mount works.

What happens if SPIRE is down during a deployment? New pods that come up while SPIRE is down won't be able to get SVIDs. They'll be unable to authenticate connections that require mutual auth. Existing connections on pods that already have valid (non-expired) SVIDs will continue to work until their SVIDs expire. This is another reason to treat SPIRE Server availability as a critical SLO.


For Cilium's core networking — the eBPF datapath, service load balancing, and baseline network policy enforcement that mutual auth builds on — see Cilium and eBPF: Kubernetes Networking Without kube-proxy. Mutual auth sits on top of Cilium's policy engine; understanding the base policy model first makes the authentication extension much clearer.

For Cilium WireGuard encryption, Hubble network observability, and advanced CiliumNetworkPolicy patterns that complement mutual auth (FQDN-based egress, L7 HTTP policy, DNS-aware policy) — Cilium Advanced Networking and Observability covers those in depth.

For the broader eBPF platform engineering context — how Tetragon extends Cilium for runtime security observability (syscall-level visibility, process identity, kernel-level alerts) alongside Cilium's network security — see eBPF Platform Engineering with Cilium and Tetragon.

If your workloads need L7 features alongside mTLS — retries, circuit breaking, header-based routing — Istio Service Mesh on Kubernetes covers Istio's ambient mode (ztunnel-based, sidecarless) as the closest Istio analog to Cilium's architecture.

For CiliumNetworkPolicy patterns beyond mutual auth — FQDN-based egress rules, L7 HTTP path matching, cluster-wide policy via CiliumClusterwideNetworkPolicyKubernetes Network Policy Patterns covers those patterns and how they compose with the authentication requirements shown here.


Evaluating zero-trust networking for a multi-team Kubernetes platform? Talk to us at Coding Protocols — we help platform teams implement identity-based authentication that doesn't require sidecar proxies or a full service mesh adoption.

Related Topics

Cilium
mTLS
SPIFFE
SPIRE
Security
Service Mesh
eBPF
Zero Trust
Kubernetes

Read Next