Platform Engineering
14 min readMay 6, 2026

Istio Service Mesh on Kubernetes: mTLS, Traffic Management, and Observability

Istio adds mTLS encryption between services, fine-grained traffic control (canary deployments, circuit breaking, retries), and distributed tracing without application code changes. The control plane (Istiod) pushes Envoy proxy configuration to sidecars injected into each pod. Production Istio requires understanding the Ambient mesh mode (sidecarless, GA in Istio 1.24), PeerAuthentication for mTLS enforcement, and AuthorizationPolicy for zero-trust service-to-service access control.

CO
Coding Protocols Team
Platform Engineering
Istio Service Mesh on Kubernetes: mTLS, Traffic Management, and Observability

A service mesh operates at layer 4/7 between your services, handling concerns that shouldn't be in application code: mutual TLS (so every service-to-service call is authenticated and encrypted), retries and circuit breaking (so a slow dependency doesn't take down the caller), and distributed tracing (so you can see the full call graph across services).

Istio is the most widely deployed service mesh. It runs Envoy proxies alongside your application containers — either as injected sidecars (classic mode) or as a shared node-level proxy layer (Ambient mode, GA in Istio 1.24). The control plane (Istiod) configures those proxies via xDS without touching your application.


Sidecar vs Ambient Mode

SidecarAmbient
Proxy placementOne Envoy per pod (injected)Shared ztunnel per node + optional waypoint proxy
Resource overhead~50-100MB RAM + 0.5 CPU per pod~30MB per node for ztunnel; waypoint only where needed
Pod restart on enableYes (sidecar injection requires pod restart)No (ztunnel handles traffic without pod restarts)
L7 featuresAll — sidecar intercepts all trafficztunnel: L4 mTLS only; waypoint proxy: L7 (per-service)
GA statusGA since Istio 1.0GA since Istio 1.24 (Beta in 1.22)

For new deployments: Ambient mode is the recommended path for most clusters — lower overhead, no pod churn to enable, and the waypoint proxy brings L7 features when needed. Sidecar mode is still supported and required for L7 features on all traffic without waypoint proxies.


Installation

bash
1# Install Istio with Ambient mode (recommended)
2curl -L https://istio.io/downloadIstio | sh -
3cd istio-1.24.0    # Check https://github.com/istio/istio/releases for the latest release
4export PATH=$PWD/bin:$PATH
5
6istioctl install --set profile=ambient -y
7
8# Verify installation
9istioctl verify-install
10kubectl get pods -n istio-system

The ambient profile installs:

  • istiod — control plane
  • ztunnel — DaemonSet on every node (L4 mTLS and L7 forwarding to waypoints)
  • istio-cni — CNI plugin that routes pod traffic through ztunnel without iptables in the sidecar

Enrolling Namespaces in Ambient Mesh

bash
# Enroll the payments namespace in Ambient mesh
kubectl label namespace payments istio.io/dataplane-mode=ambient

# All existing pods in the namespace are immediately enrolled — no restart needed
# Verify: pods should show as enrolled
kubectl get pods -n payments -o json | jq '.items[].metadata.annotations["ambient.istio.io/redirection"]'

For sidecar mode (if you need L7 features without waypoint proxies):

bash
kubectl label namespace payments istio-injection=enabled
# Existing pods need to be restarted to inject the sidecar
kubectl rollout restart deployment -n payments

mTLS: Zero-Trust Service Authentication

By default in Ambient mode, all enrolled workloads communicate over mTLS automatically. To enforce that no plaintext traffic is accepted (block non-mesh services from calling mesh services):

yaml
1apiVersion: security.istio.io/v1
2kind: PeerAuthentication
3metadata:
4  name: payments-mtls
5  namespace: payments
6spec:
7  mtls:
8    mode: STRICT    # STRICT | PERMISSIVE | DISABLE
9    # STRICT: reject all non-mTLS connections
10    # PERMISSIVE: accept both mTLS and plaintext (useful during migration)
11    # DISABLE: disable mTLS

STRICT mode means a service outside the mesh cannot call services in the payments namespace without going through Istio-managed mTLS. Apply cluster-wide:

yaml
1apiVersion: security.istio.io/v1
2kind: PeerAuthentication
3metadata:
4  name: cluster-mtls-strict
5  namespace: istio-system    # istio-system namespace = cluster-wide policy
6spec:
7  mtls:
8    mode: STRICT

AuthorizationPolicy: Zero-Trust Service Access

mTLS authenticates peers; AuthorizationPolicy authorizes which services can call which. Without an AuthorizationPolicy, all authenticated mesh services can call each other:

yaml
1# Default deny-all in the payments namespace
2apiVersion: security.istio.io/v1
3kind: AuthorizationPolicy
4metadata:
5  name: deny-all
6  namespace: payments
7spec: {}    # Empty spec = deny all traffic
8
9---
10# Allow only the orders service to call payments-api
11apiVersion: security.istio.io/v1
12kind: AuthorizationPolicy
13metadata:
14  name: allow-orders-to-payments
15  namespace: payments
16spec:
17  selector:
18    matchLabels:
19      app: payments-api
20  action: ALLOW
21  rules:
22    - from:
23        - source:
24            principals:
25              # ServiceAccount identity: cluster.local/ns/<namespace>/sa/<serviceaccount>
26              - "cluster.local/ns/orders/sa/orders-api"
27      to:
28        - operation:
29            methods: ["POST"]
30            paths: ["/payments/*"]

The principals field is the SPIFFE identity of the caller's service account — it's derived from the mTLS certificate that Istiod provisions for each workload. This is stronger than IP-based network policies because it's tied to identity, not network address.


Waypoint Proxy: L7 Features in Ambient Mode

For HTTP routing, retries, and circuit breaking in Ambient mode, deploy a waypoint proxy for the namespace:

bash
# Deploy a waypoint proxy for the payments namespace
istioctl waypoint apply --namespace payments
# Label the namespace to route traffic through the waypoint
kubectl label namespace payments istio.io/use-waypoint=waypoint
yaml
1# VirtualService: traffic splitting with waypoint proxy
2apiVersion: networking.istio.io/v1
3kind: VirtualService
4metadata:
5  name: payments-api
6  namespace: payments
7spec:
8  hosts:
9    - payments-api    # Kubernetes Service name
10
11  http:
12    # Canary: send 10% to v2
13    - route:
14        - destination:
15            host: payments-api
16            subset: v1
17          weight: 90
18        - destination:
19            host: payments-api
20            subset: v2
21          weight: 10
22
23---
24apiVersion: networking.istio.io/v1
25kind: DestinationRule
26metadata:
27  name: payments-api
28  namespace: payments
29spec:
30  host: payments-api
31  trafficPolicy:
32    connectionPool:
33      tcp:
34        maxConnections: 100
35      http:
36        http1MaxPendingRequests: 50
37        http2MaxRequests: 100
38    outlierDetection:
39      consecutiveGatewayErrors: 5    # Eject host after 5 consecutive 5xx responses
40      interval: 10s
41      baseEjectionTime: 30s
42      maxEjectionPercent: 50         # Never eject more than 50% of hosts
43  subsets:
44    - name: v1
45      labels:
46        version: v1
47    - name: v2
48      labels:
49        version: v2

Retries and Timeouts

yaml
1apiVersion: networking.istio.io/v1
2kind: VirtualService
3metadata:
4  name: payments-api
5  namespace: payments
6spec:
7  hosts:
8    - payments-api
9  http:
10    - timeout: 10s    # Request timeout
11      retries:
12        attempts: 3
13        perTryTimeout: 3s
14        retryOn: "5xx,connect-failure,reset"    # Retry conditions
15      route:
16        - destination:
17            host: payments-api

The retryOn field accepts a comma-separated list: 5xx (retry on 5xx responses), connect-failure, retriable-4xx, reset, gateway-error. For payment services, be careful with retries — only retry if the operation is idempotent.


Observability: Prometheus Metrics

Istio's Envoy sidecars (and ztunnel in Ambient mode) expose Prometheus metrics automatically. Key PromQL queries for Istio-instrumented services:

promql
1# Request rate to payments-api (all source services)
2sum(rate(istio_requests_total{destination_service_name="payments-api"}[5m])) by (source_workload)
3
4# Request success rate (non-5xx) for payments-api
5sum(rate(istio_requests_total{
6  destination_service_name="payments-api",
7  response_code!~"5.*"
8}[5m])) /
9sum(rate(istio_requests_total{
10  destination_service_name="payments-api"
11}[5m]))
12
13# P99 latency for payments-api
14histogram_quantile(0.99,
15  sum(rate(istio_request_duration_milliseconds_bucket{
16    destination_service_name="payments-api"
17  }[5m])) by (le)
18)
19
20# Error rate (5xx responses) — alert when > 1%
21sum(rate(istio_requests_total{
22  destination_service_name="payments-api",
23  response_code=~"5.*"
24}[5m])) /
25sum(rate(istio_requests_total{
26  destination_service_name="payments-api"
27}[5m]))

These metrics are emitted from the Envoy sidecar (or ztunnel) — no application instrumentation required. The destination_service_name label matches the Kubernetes Service name; source_workload identifies the calling workload. For Ambient mode with a waypoint proxy, the same metrics are available from the waypoint's Envoy instance.


Distributed Tracing

Istio propagates distributed trace context (W3C TraceContext, B3, Jaeger) automatically between services. Applications need to forward the incoming trace headers on outbound calls — Istio injects the headers into inbound requests but can't automatically forward them for you:

python
1# Forward trace headers from incoming request to outbound calls
2TRACE_HEADERS = [
3    "x-request-id",
4    "x-b3-traceid",
5    "x-b3-spanid",
6    "x-b3-parentspanid",
7    "x-b3-sampled",
8    "x-b3-flags",
9    "traceparent",    # W3C TraceContext
10    "tracestate",
11]
12
13def call_downstream(incoming_headers: dict, path: str):
14    forwarded = {k: v for k, v in incoming_headers.items() if k.lower() in TRACE_HEADERS}
15    return requests.post(f"http://payments-api/{path}", headers=forwarded)

Integrate with Grafana Tempo or Jaeger via Istio's telemetry API:

yaml
1apiVersion: telemetry.istio.io/v1alpha1
2kind: Telemetry
3metadata:
4  name: payments-tracing
5  namespace: payments
6spec:
7  tracing:
8    - providers:
9        - name: tempo    # Defined in Istio's MeshConfig
10      randomSamplingPercentage: 10    # 10% sampling

Istio and the Kubernetes Gateway API

In 2026, the Kubernetes Gateway API is the preferred way to manage ingress and egress for the Istio mesh. The Istio ingressgateway is increasingly replaced by standard Gateway resources:

yaml
1apiVersion: gateway.networking.k8s.io/v1
2kind: Gateway
3metadata:
4  name: istio-gateway
5  namespace: istio-system
6spec:
7  gatewayClassName: istio
8  listeners:
9    - name: http
10      port: 80
11      protocol: HTTP
12      allowedRoutes:
13        namespaces:
14          from: All
yaml
1apiVersion: gateway.networking.k8s.io/v1
2kind: HTTPRoute
3metadata:
4  name: payments-route
5  namespace: payments
6spec:
7  parentRefs:
8    - name: istio-gateway
9      namespace: istio-system
10  rules:
11    - matches:
12        - path: { type: PathPrefix, value: /v1 }
13      backendRefs:
14        - name: payments-api
15          port: 8080

Using the Gateway API provides a unified, cross-vendor configuration model while leveraging Istio's high-performance Envoy data plane for advanced routing and security.


Frequently Asked Questions

Should I use Ambient mode or sidecar mode for a new cluster?

Ambient mode for most new clusters. The lower resource overhead (no per-pod Envoy) is significant at scale — a 100-pod namespace saves ~5-10GB RAM. The waypoint proxy provides all the L7 features when needed. The main reason to stick with sidecar mode: existing tooling deeply integrated with sidecar proxy semantics, or if you need per-pod L7 features without deploying a waypoint proxy per namespace.

Does Istio replace NetworkPolicy?

No — they operate at different layers. NetworkPolicy is enforced by the CNI plugin at the IP/port level, before traffic reaches the application. Istio AuthorizationPolicy operates at the identity/HTTP level, after mTLS authentication. For defense in depth, use both: NetworkPolicy blocks traffic at the network layer; AuthorizationPolicy enforces identity-based access at the application layer.

How does Istio handle database connections?

Istio sidecars intercept all TCP traffic, including database connections. This causes a problem with protocols that use client certificates for authentication (some PostgreSQL configurations, MongoDB x.509 auth, etc.) — the sidecar intercepts the connection and presents its own mTLS certificate, which the database doesn't recognise as a valid client certificate. Three options:

  1. Exclude database pods from the mesh: Annotate database pods with sidecar.istio.io/inject: "false" (sidecar mode) or exclude them from the ambient namespace. The simplest option when you control the database deployment.
  2. Disable mTLS for database traffic: Add a DestinationRule with tls.mode: DISABLE for the database service — the sidecar passes traffic through without mTLS wrapping. The application's own TLS configuration to the database is preserved.
  3. Use PASSTHROUGH for external databases: For databases outside the cluster (RDS, Cloud SQL), use ServiceEntry with tls.mode: PASSTHROUGH so the sidecar does not terminate the TLS session.

In Ambient mode, ztunnel handles mTLS at the TCP layer without per-pod sidecars — but the same intercept behaviour applies. Use the same DestinationRule tls.mode: DISABLE approach for in-cluster databases that handle their own TLS.

How do I debug mTLS issues?

bash
1# Check mTLS status between two services
2istioctl x check-inject -n payments
3istioctl proxy-config cluster <pod-name>.payments | grep payments-api
4
5# Check AuthorizationPolicy is working
6istioctl x authz check <pod-name> -n payments
7
8# Check Envoy proxy configuration (sidecar mode)
9istioctl proxy-config cluster <pod-name> -n payments
10istioctl proxy-config route <pod-name> -n payments

For zero-trust network policies that complement Istio's service mesh security, see Kubernetes Network Policies: Zero-Trust Networking. For distributed tracing backends that receive Istio's trace data, see OpenTelemetry on Kubernetes: Collector, Auto-Instrumentation, and the Operator.

Deploying Istio on EKS for the first time or migrating from a simpler network policy approach to full zero-trust? Talk to us at Coding Protocols — we help platform teams implement service mesh architectures that deliver mTLS and traffic management without destabilizing existing workloads.

Related Topics

Istio
Service Mesh
Kubernetes
mTLS
Envoy
Traffic Management
Observability
Security
EKS

Read Next