Security

SPIFFE/SPIRE: Workload Identity and Zero-Trust mTLS

Advanced90 min to complete22 min read

Service-to-service mTLS without a service mesh, without hardcoded certificates, and without manual rotation. SPIFFE/SPIRE issues short-lived X.509 SVIDs to workloads based on their Kubernetes identity. This tutorial shows you how to deploy SPIRE, register workloads, and wire up mutual TLS between two services.

Before you begin

  • Kubernetes cluster
  • kubectl and Helm installed
  • Understanding of TLS/X.509 certificates and mutual TLS
  • Basic Kubernetes RBAC knowledge
Kubernetes
SPIFFE
SPIRE
Security
mTLS
Zero Trust
Workload Identity

Every service-to-service authentication scheme I've seen falls into one of three traps: shared secrets that rotate every quarter at best (and get leaked), service account tokens that are long-lived and often overprivileged, or a full service mesh that adds significant operational overhead to solve what is fundamentally an identity problem. SPIFFE/SPIRE solves the identity problem cleanly. Each workload gets a cryptographic identity — an SVID — that is short-lived (1 hour by default), automatically renewed, and tied to that workload's Kubernetes identity. If a process isn't registered with SPIRE, it gets nothing. No fallback, no exception.

This tutorial walks through deploying SPIRE on Kubernetes, registering two workloads, and wiring up genuine mTLS between them using the go-spiffe library. No service mesh, no hardcoded certs, no manual rotation.

Concepts First

SPIFFE (Secure Production Identity Framework For Everyone) is the standard — an open specification for workload identity. SPIRE is the reference implementation. They're often used together, but the spec allows other implementations. Here I'll use SPIRE because it's the most production-ready option for Kubernetes.

Key terms you need to internalize before touching any configuration:

  • SPIFFE ID: a URI that identifies a workload: spiffe://trust-domain/path. Example: spiffe://production.example.com/ns/default/sa/my-app. The path is arbitrary but by convention encodes the namespace and ServiceAccount.
  • SVID (SPIFFE Verifiable Identity Document): an X.509 certificate containing the SPIFFE ID as a Subject Alternative Name URI field. This is what services exchange during a TLS handshake to prove identity. The certificate is signed by your SPIRE Server acting as a CA.
  • Trust Domain: the scope of SPIFFE IDs your SPIRE Server manages — usually your organization's domain (e.g., example.org). IDs from different trust domains require explicit federation to be mutually trusted.
  • SPIRE Server: the CA. Issues SVIDs to registered workloads, maintains the entry registry, and manages the trust bundle (the root CA cert that all workloads use to verify peer SVIDs).
  • SPIRE Agent: runs as a DaemonSet on every node. Attests the workload's identity using the Kubernetes workload attestor and fetches SVIDs from the Server on the workload's behalf.
  • Workload API: a Unix socket at /tmp/spiffe-workload-api/agent.sock. Applications call this socket to get their SVID. The go-spiffe library abstracts this entirely — you never touch the socket directly.

How it all fits together:

┌─────────────────────────────────┐
│          SPIRE Server           │
│  (StatefulSet, Raft datastore)  │
│     CA / Entry Registry         │
└──────────────┬──────────────────┘
               │ mTLS bootstrap
               ▼
┌─────────────────────────────────┐
│  SPIRE Agent (DaemonSet)        │  ← one per node
│  - Attests node identity        │
│  - Holds node SVID              │
│  - Exposes Workload API socket  │
└──────────────┬──────────────────┘
               │ Unix socket
               ▼
┌─────────────────────────────────┐
│  Your Application (Pod)         │
│  - go-spiffe client             │
│  - Calls Workload API           │
│  - Gets SVID (X.509 cert)       │
│  - Does mTLS with peer using it │
└─────────────────────────────────┘

The Agent is the critical intermediary. It never gives the Server direct access to your workload — it mediates on your workload's behalf, after verifying the workload's identity using the Kubernetes API and host-level PID inspection.

What You'll Build

By the end of this tutorial you'll have:

  • SPIRE Server (StatefulSet) and SPIRE Agent (DaemonSet) deployed and healthy
  • Two workload entries registered: service-a and service-b, each with a unique SPIFFE ID
  • service-a fetching its SVID via the go-spiffe library and presenting it to service-b during the TLS handshake
  • service-b verifying service-a's SPIFFE ID and rejecting connections from anything else
  • A demonstration that an unregistered workload gets no SVID — and therefore cannot participate in mTLS at all

Step 1: Deploy SPIRE Server

The official SPIRE Helm chart handles both the Server (StatefulSet) and the Agent (DaemonSet) in a single install. Add the repo and install:

bash
1helm repo add spiffe https://spiffe.github.io/helm-charts-hardened/
2helm repo update
3
4helm install spire spiffe/spire \
5  --namespace spire \
6  --create-namespace \
7  --set global.openshift=false \
8  --set spire-server.replicaCount=1 \
9  --set spire-server.dataStore.sql.databaseType=sqlite3

For production you'd swap SQLite for a proper database (PostgreSQL is the standard choice) and run 3 replicas with Raft consensus. For this tutorial, SQLite is fine.

Verify the Server came up:

bash
kubectl get statefulset spire-server -n spire
kubectl get pods -n spire

You should see spire-server-0 Running and spire-agent-* pods Running — one per node. If agents are in CrashLoopBackOff, check whether your cluster allows hostPID: true (addressed in Step 2).

Confirm the Server reports healthy:

bash
kubectl exec -n spire spire-server-0 -- \
  spire-server healthcheck

Expected output:

Server is healthy.

If you get anything else, check the Server logs: kubectl logs -n spire spire-server-0. The most common issue at this stage is a misconfigured trust domain in the Helm values.

Step 2: Verify the SPIRE Agent

The Helm chart deploys the Agent DaemonSet automatically. But it's worth understanding what the Agent is doing, because it's where most production problems originate.

bash
kubectl get ds spire-agent -n spire
kubectl logs -n spire ds/spire-agent | grep "SVID is current"

The Agent performs two distinct attestations:

  1. Node attestation: The Agent proves to the Server that it's running on a legitimate Kubernetes node. By default it uses the Kubernetes Service Account Token attestor (k8s_sat) — the Agent presents a projected service account token that the Server validates against the Kubernetes API.

  2. Workload attestation: When a pod calls the Workload API socket, the Agent identifies the calling process by its PID on the host, looks up the container ID from /proc/<pid>/cgroup, then calls the Kubernetes API to get the pod's namespace and ServiceAccount. This is how the Agent maps a process to a SPIFFE ID entry.

This is why the Agent DaemonSet requires hostPID: true. Without host PID visibility, the Agent cannot see the pod's process in the host PID namespace, and workload attestation fails silently — the pod calls the socket and gets nothing back, with a confusing error like "no identity issued."

Check the Agent DaemonSet spec to confirm:

bash
kubectl get ds spire-agent -n spire -o jsonpath='{.spec.template.spec.hostPID}'

Expected:

true

If it's not set, you'll need to patch the DaemonSet or update the Helm values and re-deploy.

Step 3: Register Workload Entries

An entry is the mapping that tells the SPIRE Server: "any pod in namespace X with ServiceAccount Y gets SVID spiffe://trust-domain/Z." Entries are the heart of the access model — if there's no entry, there's no SVID, period.

First, create the namespace and service accounts:

bash
kubectl create namespace demo
kubectl create serviceaccount service-a -n demo
kubectl create serviceaccount service-b -n demo

Now register the entries on the Server. The -parentID field specifies the Agent node's SPIFFE ID, which ties the workload entry to a specific node attestor. For the k8s_sat attestor, the format is spiffe://<trust-domain>/spire/agent/k8s_sat/<cluster-name>/<node-name>.

bash
1# service-a entry
2kubectl exec -n spire spire-server-0 -- \
3  spire-server entry create \
4    -spiffeID spiffe://example.org/ns/demo/sa/service-a \
5    -parentID spiffe://example.org/spire/agent/k8s_sat/default/$(kubectl get node -o jsonpath='{.items[0].metadata.name}') \
6    -selector k8s:ns:demo \
7    -selector k8s:sa:service-a
8
9# service-b entry
10kubectl exec -n spire spire-server-0 -- \
11  spire-server entry create \
12    -spiffeID spiffe://example.org/ns/demo/sa/service-b \
13    -parentID spiffe://example.org/spire/agent/k8s_sat/default/$(kubectl get node -o jsonpath='{.items[0].metadata.name}') \
14    -selector k8s:ns:demo \
15    -selector k8s:sa:service-b

Two things to note here. First, -selector k8s:ns:demo -selector k8s:sa:service-a means both selectors must match — namespace demo AND ServiceAccount service-a. Either selector alone is insufficient. Second, the -parentID uses the first node's name. In a multi-node cluster you'd either create entries referencing all nodes or use a wildcard — but for this tutorial one node is enough.

Verify both entries registered:

bash
kubectl exec -n spire spire-server-0 -- spire-server entry show

You should see two entries, each with their SPIFFE ID, selectors, and an expiry window of 3600 seconds (1 hour).

Step 4: service-a Fetches Its SVID

Here's a minimal Go program that connects to the Workload API, fetches its SVID, logs the identity, and sets up a TLS client config for calling service-b:

go
1package main
2
3import (
4    "context"
5    "crypto/tls"
6    "fmt"
7    "log"
8    "net/http"
9
10    "github.com/spiffe/go-spiffe/v2/spiffeid"
11    "github.com/spiffe/go-spiffe/v2/spiffetls/tlsconfig"
12    "github.com/spiffe/go-spiffe/v2/workloadapi"
13)
14
15func main() {
16    ctx := context.Background()
17
18    // Connect to the Workload API via the Unix socket.
19    // The default socket path is /tmp/spiffe-workload-api/agent.sock.
20    // Set SPIFFE_ENDPOINT_SOCKET env var to override.
21    source, err := workloadapi.NewX509Source(ctx)
22    if err != nil {
23        log.Fatalf("Failed to create X509Source: %v", err)
24    }
25    defer source.Close()
26
27    // Inspect our own SVID — useful for debugging.
28    svid, err := source.GetX509SVID()
29    if err != nil {
30        log.Fatalf("Failed to get SVID: %v", err)
31    }
32
33    fmt.Printf("Got SVID: %s\n", svid.ID)
34    fmt.Printf("Valid until: %s\n", svid.Certificates[0].NotAfter)
35
36    // Build a TLS client config that:
37    // 1. Presents our SVID as the client certificate
38    // 2. Verifies that the server's SVID is exactly service-b's SPIFFE ID
39    serverID := spiffeid.RequireIDFromString("spiffe://example.org/ns/demo/sa/service-b")
40    tlsConfig := tlsconfig.MTLSClientConfig(source, source, tlsconfig.AuthorizeID(serverID))
41
42    client := &http.Client{
43        Transport: &http.Transport{
44            TLSClientConfig: tlsConfig,
45        },
46    }
47
48    resp, err := client.Get("https://service-b.demo.svc.cluster.local:8443/")
49    if err != nil {
50        log.Fatalf("Failed to call service-b: %v", err)
51    }
52    defer resp.Body.Close()
53
54    fmt.Printf("service-b responded: %s\n", resp.Status)
55}

workloadapi.NewX509Source connects to the socket and returns a source that automatically rotates the SVID before it expires. This is the key operational advantage over any manual cert management: you never call renewCert(), you never watch a TTL, you never restart the process to pick up a new cert. The source handles renewal in a background goroutine and the tlsconfig helpers always pull the current certificate from the source on each new TLS connection.

The tlsconfig.AuthorizeID(serverID) call is doing real work. It configures the TLS handshake to verify that the server presents an SVID whose SPIFFE ID exactly matches spiffe://example.org/ns/demo/sa/service-b. If service-b is misconfigured or impersonated, the handshake fails before any application data is exchanged.

Step 5: service-b Verifies the SVID

service-b runs an mTLS server that requires a client certificate and authorizes only service-a's SPIFFE ID:

go
1package main
2
3import (
4    "context"
5    "fmt"
6    "log"
7    "net/http"
8
9    "github.com/spiffe/go-spiffe/v2/spiffeid"
10    "github.com/spiffe/go-spiffe/v2/spiffetls/tlsconfig"
11    "github.com/spiffe/go-spiffe/v2/workloadapi"
12)
13
14func main() {
15    ctx := context.Background()
16
17    source, err := workloadapi.NewX509Source(ctx)
18    if err != nil {
19        log.Fatalf("Failed to create X509Source: %v", err)
20    }
21    defer source.Close()
22
23    // Require that clients present an SVID with exactly service-a's SPIFFE ID.
24    // Any other client — including one with a valid SVID from the same trust domain
25    // but a different SPIFFE ID — will be rejected at the TLS handshake.
26    clientID := spiffeid.RequireIDFromString("spiffe://example.org/ns/demo/sa/service-a")
27    tlsConfig := tlsconfig.MTLSServerConfig(source, source, tlsconfig.AuthorizeID(clientID))
28
29    server := &http.Server{
30        Addr:      ":8443",
31        TLSConfig: tlsConfig,
32        Handler: http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
33            fmt.Fprintln(w, "Hello from service-b — your identity is verified")
34        }),
35    }
36
37    log.Println("service-b listening on :8443 with mTLS")
38    // Empty strings for cert/key — the TLS config provides them via the source.
39    if err := server.ListenAndServeTLS("", ""); err != nil {
40        log.Fatal(err)
41    }
42}

tlsconfig.MTLSServerConfig sets tls.RequireAnyClientCert and installs a custom VerifyPeerCertificate callback that validates the client's certificate chain against the SPIRE trust bundle and then checks the SPIFFE ID. No shared secrets. No API keys. No Kubernetes API call at request time. The verification is entirely local, using the trust bundle the Agent maintains.

When service-a connects, the handshake flow is:

  1. service-b presents its SVID (signed by the SPIRE CA) as the server certificate
  2. service-a verifies service-b's SPIFFE ID matches spiffe://example.org/ns/demo/sa/service-b
  3. service-a presents its SVID as the client certificate
  4. service-b verifies service-a's SPIFFE ID matches spiffe://example.org/ns/demo/sa/service-a
  5. Both verifications succeed — the connection is established

If step 2 or 4 fails, the connection is terminated before any application payload is sent. This is genuine mutual authentication, not "I trust any cert signed by this CA" — the SPIFFE ID check adds the workload identity layer on top of the cryptographic trust chain.

Step 6: Show What an Unregistered Workload Gets

This step is worth running explicitly to see the access control model in action. Deploy a pod using the default ServiceAccount in the demo namespace — there's no entry registered for it:

bash
kubectl run unregistered \
  --image=curlimages/curl \
  -n demo \
  --serviceaccount=default \
  -- sleep 3600

Wait for the pod to start, then try to call the Workload API socket:

bash
kubectl exec -n demo unregistered -- \
  sh -c "wget -q -O- --unix-socket /tmp/spiffe-workload-api/agent.sock http://localhost"

Expected output:

wget: can't connect to remote host: Connection refused

Or, if the socket is mounted but the Agent returns an error response:

wget: server returned error: HTTP/1.1 403 Forbidden

The Agent receives the request, performs workload attestation (looks up the pod's namespace and ServiceAccount), finds no matching entry in the registry, and returns nothing. There's no way to force the Agent to issue an SVID for an unregistered workload. This is the model: identity is asserted by the platform, not by the workload itself.

Now try to call service-b directly with plain HTTP or a self-signed cert — service-b's ListenAndServeTLS with MTLSServerConfig requires a valid SPIFFE client certificate. Without one, the TLS handshake fails immediately.

Verification

Run these after the full deployment to confirm everything is wired up:

bash
1# List all registered entries with their selectors and SPIFFE IDs
2kubectl exec -n spire spire-server-0 -- spire-server entry show
3
4# Confirm agents are connected and reporting healthy
5kubectl exec -n spire spire-server-0 -- spire-server agent list
6
7# Fetch the X.509 SVID for service-a directly from the Agent and inspect it
8kubectl exec -n spire ds/spire-agent -- \
9  /opt/spire/bin/spire-agent api fetch x509 \
10  -socketPath /tmp/spiffe-workload-api/agent.sock \
11  -write /tmp/svid.pem 2>/dev/null && \
12  openssl x509 -in /tmp/svid.pem -noout -text | grep -E "URI:|Not After"

In the openssl output you should see a URI: line containing spiffe://example.org/ns/demo/sa/service-a and a Not After roughly 1 hour from now. That short TTL is intentional — if an SVID leaks, it's worthless in an hour.

SPIFFE/SPIRE vs Alternatives

vs cert-manager: cert-manager is excellent for Ingress TLS and internal PKI. It issues certificates, but managing which service gets which certificate is a manual provisioning step — you create a Certificate resource, reference the right Issuer, and deploy the resulting Secret to the right namespace. SPIRE ties certificate issuance directly to workload identity by observing Kubernetes-native signals (namespace, ServiceAccount). There's nothing to manually provision per workload. The tradeoff: cert-manager is simpler to operate and doesn't require a DaemonSet or a Workload API; SPIRE has more moving parts but gives you genuine, automatically-provisioned workload identity rather than manually-managed certs.

vs Istio mTLS: Istio uses SPIFFE internally — sidecar-to-sidecar mTLS in Istio uses certificates from Citadel/istiod that contain SPIFFE-format URIs. The difference is in where the TLS handling happens. Istio intercepts traffic at the network layer via the Envoy sidecar proxy; your application code is completely unaware of mTLS. SPIRE requires your application to explicitly call the Workload API (via go-spiffe or equivalent) and manage the TLS connection itself. Istio adds ~50-100ms of sidecar overhead per request and significant operational complexity; SPIRE adds zero proxying overhead but requires application-level integration. If you're already running Istio, SPIRE is probably redundant. If you're not, SPIRE gives you workload identity without the full mesh.

vs Kubernetes service account tokens: Kubernetes projected service account tokens (bound tokens) are short-lived JWTs — a meaningful improvement over the old long-lived tokens. But they're authentication tokens, not identity documents for mTLS. Service B can't verify Service A's identity from a JWT without calling the Kubernetes TokenReview API at request time — which means a network call to the API server per authentication event, under the API server's rate limits. SVIDs are self-contained X.509 certificates; service-b verifies service-a's identity entirely locally using the trust bundle. No API server call, no latency dependency on the control plane. For high-throughput service-to-service calls this matters.

Common Mistakes

1. Forgetting hostPID: true on the Agent DaemonSet. The Kubernetes workload attestor identifies a workload by its PID in the host PID namespace. Without hostPID: true, the Agent cannot see the pod's process and workload attestation silently fails. The pod's call to the Workload API socket will hang or return an error with no clear indication that the problem is hostPID.

2. Trust domain mismatch between Server and Agent. The trust domain is configured in both the Server's trust_domain field and the Agent's trust_domain field — they must be identical. A mismatch results in "invalid SVID" or "certificate not valid for trust domain" errors that are surprisingly opaque. Always set both from the same Helm value (global.trustDomain in the official chart).

3. Using the same SPIFFE ID for multiple distinct workloads. If service-a and service-c share the same SPIFFE ID, they're cryptographically indistinguishable to any peer that verifies SPIFFE IDs. Service-b configured to accept spiffe://example.org/ns/demo/sa/service-a would also accept connections from service-c. Every distinct workload — defined by namespace and ServiceAccount — should have a unique SPIFFE ID. This is the whole point of the identity model.

4. Caching the SVID at startup without renewal. If your application calls the Workload API once at startup, stores the cert bytes, and never refreshes, it will fail after approximately 1 hour when the SVID expires. The correct pattern is workloadapi.NewX509Source, which maintains an internal watcher and rotates the SVID in the background. Always use the go-spiffe source abstraction rather than calling the gRPC Workload API directly unless you implement rotation yourself.

5. Getting the parentID format wrong in entry registration. The Agent's SPIFFE ID (the parentID used when registering workload entries) has the format spiffe://<trust-domain>/spire/agent/<attestor>/<cluster>/<node>. If you get the cluster name or attestor type wrong, entry registration succeeds (no validation error) but the Agent never picks up the entry, and the workload gets no SVID. Always verify by running spire-server entry show and cross-referencing the parentID against spire-server agent list.

Cleanup

bash
helm uninstall spire -n spire
kubectl delete namespace spire demo

The StatefulSet uses a PersistentVolumeClaim for the SQLite datastore. Delete it explicitly if you want to clean up the PVC:

bash
kubectl delete pvc -n spire --all

SPIFFE/SPIRE is one of those pieces of infrastructure that feels like overhead until the day you're dealing with a compromised credential, an overprivileged service account, or a cert that expired on a Sunday night. Short-lived, automatically-rotated, workload-specific identity solves all three. The operational complexity is real — the DaemonSet, the Server, the entry registry — but it's bounded complexity. Once it's running, the day-to-day is essentially zero: workloads get identity automatically, SVIDs rotate without restarts, and access is controlled by the entry registry rather than scattered across Secrets and ConfigMaps.

The next step from here is federation: configuring SPIRE to trust SVIDs from a different trust domain, which is how you extend this model across clusters or to external services. That's a separate tutorial. For now, get the single-cluster setup running and get a feel for how the Workload API changes the way you think about service-to-service authentication.

Official References

  • SPIFFE Overview — The SPIFFE standard: SVIDs, trust domains, the Workload API, and the SPIFFE spec
  • SPIRE Documentation — SPIRE architecture, Server, Agent, node attestation, and workload attestation
  • go-spiffe Library — The official Go client library for the SPIFFE Workload API (workloadapi, tlsconfig, spiffeid packages)
  • SPIRE Kubernetes Workload Attestor — How the Kubernetes workload attestor identifies pods via PID, container ID, and Kubernetes API
  • SPIRE Helm Charts — Production-hardened Helm charts for deploying SPIRE Server and Agent on Kubernetes
  • SPIFFE Federation — How to federate trust domains across clusters so SVIDs from one domain are trusted in another

We built Podscape to simplify Kubernetes workflows like this — logs, events, and cluster state in one interface, without switching tools.

Struggling with this in production?

We help teams fix these exact issues. Our engineers have deployed these patterns across production environments at scale.