OpenTelemetry on Kubernetes: Collector, Auto-Instrumentation, and the Operator
OpenTelemetry (OTel) is the CNCF standard for collecting telemetry — traces, metrics, and logs — from Kubernetes workloads. The OpenTelemetry Operator automates two things: deploying and configuring the Collector as a sidecar or DaemonSet, and injecting auto-instrumentation into application pods without code changes. This covers the production setup: Collector as DaemonSet with OTLP export, auto-instrumentation for Java/Python/Node.js, and the pipelines that send data to your backend.

OpenTelemetry isn't a backend — it's the pipeline between your application and your observability backend. The OTel Collector receives telemetry (over OTLP, Jaeger, Zipkin, or Prometheus), processes it (batching, sampling, enriching), and exports it to wherever you're storing it (Tempo, Jaeger, Honeycomb, Datadog, AWS X-Ray).
On Kubernetes, the OpenTelemetry Operator manages this entire stack: it deploys Collectors, injects auto-instrumentation agents, and handles the configuration lifecycle. The result: applications emit OTLP telemetry without code changes, the Collector enriches spans with Kubernetes metadata, and the backend gets clean, consistently-labeled data.
OpenTelemetry Operator Installation
1helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
2helm repo update
3
4helm install opentelemetry-operator open-telemetry/opentelemetry-operator \
5 --namespace opentelemetry-operator-system \
6 --create-namespace \
7 --version 0.78.0 \
8 --set admissionWebhooks.certManager.enabled=true # Uses cert-manager for webhook TLSThe Operator requires cert-manager for its admission webhook certificates. Alternatively, admissionWebhooks.autoGenerateCert.enabled=true uses a self-signed cert (simpler but less robust).
OpenTelemetryCollector: DaemonSet Mode
A DaemonSet Collector runs one pod per node — it collects telemetry from all applications on that node and forwards it to a backend. This is the most common production pattern:
1apiVersion: opentelemetry.io/v1beta1
2kind: OpenTelemetryCollector
3metadata:
4 name: otel-collector
5 namespace: opentelemetry-operator-system
6spec:
7 mode: daemonset # daemonset | deployment | sidecar | statefulset
8
9 resources:
10 limits:
11 cpu: 500m
12 memory: 512Mi
13 requests:
14 cpu: 100m
15 memory: 128Mi
16
17 config: |
18 receivers:
19 otlp:
20 protocols:
21 grpc:
22 endpoint: 0.0.0.0:4317
23 http:
24 endpoint: 0.0.0.0:4318
25 # Collect host metrics from the node
26 hostmetrics:
27 collection_interval: 30s
28 scrapers:
29 cpu: {}
30 disk: {}
31 filesystem: {}
32 memory: {}
33 network: {}
34
35 processors:
36 # Batch spans before exporting — reduces export calls
37 batch:
38 timeout: 5s
39 send_batch_size: 512
40
41 # Enrich spans with Kubernetes metadata from the pod's environment
42 k8sattributes:
43 auth_type: serviceAccount
44 passthrough: false
45 extract:
46 metadata:
47 - k8s.namespace.name
48 - k8s.pod.name
49 - k8s.pod.uid
50 - k8s.deployment.name
51 - k8s.node.name
52 - k8s.container.name
53 labels:
54 - tag_name: app
55 key: app
56 from: pod
57 annotations:
58 - tag_name: version
59 key: app.kubernetes.io/version
60 from: pod
61 pod_association:
62 - sources:
63 - from: resource_attribute
64 name: k8s.pod.ip
65 - sources:
66 - from: connection
67
68 # Memory limiter — prevent OOM in the Collector itself
69 memory_limiter:
70 check_interval: 1s
71 limit_percentage: 75
72 spike_limit_percentage: 30
73
74 exporters:
75 # Export traces to Grafana Tempo
76 otlp/tempo:
77 endpoint: tempo.monitoring.svc.cluster.local:4317
78 tls:
79 insecure: true # In-cluster; use TLS for cross-cluster
80
81 # Export metrics to Prometheus (remote write or push gateway)
82 prometheusremotewrite:
83 endpoint: http://kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090/api/v1/write
84
85 # Debug: log to stdout (remove in production)
86 # debug:
87 # verbosity: detailed
88
89 service:
90 pipelines:
91 traces:
92 receivers: [otlp]
93 processors: [memory_limiter, k8sattributes, batch]
94 exporters: [otlp/tempo]
95 metrics:
96 receivers: [otlp, hostmetrics]
97 processors: [memory_limiter, k8sattributes, batch]
98 exporters: [prometheusremotewrite]The k8sattributes processor queries the Kubernetes API to enrich every span with the pod's namespace, deployment name, and node — the metadata you need to filter traces by service in Grafana.
Instrumentation: Auto-Injection
The Instrumentation CRD configures which auto-instrumentation library to inject:
1apiVersion: opentelemetry.io/v1beta1
2kind: Instrumentation
3metadata:
4 name: otel-instrumentation
5 namespace: payments
6spec:
7 exporter:
8 endpoint: http://otel-collector-collector.opentelemetry-operator-system.svc.cluster.local:4318
9
10 propagators:
11 - tracecontext # W3C Trace Context (standard)
12 - baggage # W3C Baggage
13
14 sampler:
15 type: parentbased_traceidratio
16 argument: "0.1" # 10% sampling — adjust per traffic volume
17
18 java:
19 image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:2.9.0
20 env:
21 - name: OTEL_INSTRUMENTATION_JDBC_ENABLED
22 value: "true"
23 - name: OTEL_INSTRUMENTATION_SPRING_WEB_ENABLED
24 value: "true"
25
26 python:
27 image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.49b0
28 env:
29 - name: OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED
30 value: "true"
31
32 nodejs:
33 image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.55.0
34
35 go:
36 image: ghcr.io/open-telemetry/opentelemetry-go-instrumentation/autoinstrumentation-go:0.11.0-alpha
37 env:
38 - name: OTEL_GO_AUTO_TARGET_EXE
39 value: "/app/server" # The path to your Go binaryGo Auto-Instrumentation (eBPF)
Go is unique: because it is statically compiled, it traditionally required manual instrumentation. In 2026, the OTel eBPF agent solves this by using eBPF to trace Go function calls at the kernel level:
metadata:
annotations:
instrumentation.opentelemetry.io/inject-go: "payments/otel-instrumentation"
instrumentation.opentelemetry.io/otel-go-auto-target-exe: "/app/payments-api"The agent runs as a privileged sidecar that attaches to your Go binary, producing OTLP spans for HTTP, gRPC, and database calls without a single line of tracing code in your Go source.
Enabling Auto-Instrumentation on a Pod
Add the annotation to the Deployment's pod template — no code changes needed:
1# payments-api/deployment.yaml
2spec:
3 template:
4 metadata:
5 annotations:
6 instrumentation.opentelemetry.io/inject-java: "payments/otel-instrumentation"
7 # Or for Python: instrumentation.opentelemetry.io/inject-python: "payments/otel-instrumentation"
8 # Or for Node.js: instrumentation.opentelemetry.io/inject-nodejs: "payments/otel-instrumentation"The Operator's mutating webhook injects an init container that copies the agent library, then sets JAVA_TOOL_OPTIONS (or equivalent) to load it. The application starts with OTel auto-instrumentation active — no JAR changes, no code modifications.
Sidecar Mode for High-Volume Services
For services that generate a lot of telemetry, a sidecar Collector co-located with the pod avoids the DaemonSet becoming a bottleneck:
1apiVersion: opentelemetry.io/v1beta1
2kind: OpenTelemetryCollector
3metadata:
4 name: payments-sidecar
5 namespace: payments
6spec:
7 mode: sidecar
8
9 config: |
10 receivers:
11 otlp:
12 protocols:
13 grpc:
14 endpoint: localhost:4317 # Sidecar: bind to localhost only
15
16 processors:
17 batch:
18 timeout: 2s
19 send_batch_size: 256
20
21 exporters:
22 otlp:
23 endpoint: otel-collector-collector.opentelemetry-operator-system.svc.cluster.local:4317
24 tls:
25 insecure: true
26
27 service:
28 pipelines:
29 traces:
30 receivers: [otlp]
31 processors: [batch]
32 exporters: [otlp]Enable the sidecar on the Deployment:
metadata:
annotations:
sidecar.opentelemetry.io/inject: "payments-sidecar"The sidecar Collector receives OTLP from the application on localhost:4317 and forwards to the DaemonSet Collector. The application sends to localhost:4317 — no DNS resolution needed.
Tail Sampling: Keeping Only Interesting Traces
With 10% head-based sampling, you miss 90% of failing requests. Tail sampling evaluates the complete trace before deciding to keep it:
1processors:
2 tail_sampling:
3 decision_wait: 10s # Wait 10s for all spans before making sampling decision
4 num_traces: 100000 # Max traces in memory waiting for a decision
5 expected_new_traces_per_sec: 1000
6
7 policies:
8 # Always keep error traces
9 - name: errors-policy
10 type: status_code
11 status_code:
12 status_codes: [ERROR]
13
14 # Always keep slow traces (P99 > 1s)
15 - name: slow-traces-policy
16 type: latency
17 latency:
18 threshold_ms: 1000
19
20 # Sample 1% of successful, fast traces
21 - name: probabilistic-policy
22 type: probabilistic
23 probabilistic:
24 sampling_percentage: 1Tail sampling requires the Collector to be in deployment mode (not DaemonSet) — all spans from a trace must arrive at the same Collector instance for tail sampling to work. This typically means a two-tier Collector setup: DaemonSet Collectors on each node forward to a central deployment Collector that runs tail sampling.
Frequently Asked Questions
What's the difference between OTLP/gRPC (port 4317) and OTLP/HTTP (port 4318)?
OTLP/gRPC (4317) uses HTTP/2 multiplexing and is more efficient for high-volume, server-side telemetry. OTLP/HTTP (4318) uses HTTP/1.1 (or HTTP/2) and is more compatible with browser clients and environments where gRPC is blocked (some corporate proxies). For Kubernetes workloads, prefer gRPC. For browser-side telemetry (RUM), use HTTP.
Does auto-instrumentation work with all frameworks?
Java auto-instrumentation covers 150+ frameworks (Spring, Micronaut, Quarkus, JDBC, gRPC, Kafka, Redis, etc.). Python covers Django, Flask, FastAPI, SQLAlchemy, gRPC, and others. Node.js covers Express, Fastify, HTTP, gRPC, and database clients. Unsupported frameworks need manual SDK instrumentation. The agent won't break unsupported frameworks — it just won't produce spans for them.
How do I debug why spans aren't appearing?
- Check the Collector is receiving:
kubectl logs -n opentelemetry-operator-system daemonset/otel-collector-collector | grep -i error - Check auto-instrumentation injected:
kubectl describe pod <pod> -n payments | grep -i OTEL— should seeOTEL_EXPORTER_OTLP_ENDPOINTenv var - Check the Instrumentation endpoint matches the Collector service:
kubectl get instrumentation otel-instrumentation -n payments -o yaml | grep endpoint - Enable debug logging in the Collector: add
debug:exporter and pipe it to the traces pipeline temporarily - Check the k8sattributes processor has the correct RBAC (needs
podsread access in the namespace)
For Prometheus metrics that complement OTel traces, see Prometheus and Grafana on Kubernetes: Production Monitoring Stack. For migrating from vendor-specific agents (Datadog, New Relic) to OTel, see OpenTelemetry: Migrating from Vendor Agents to the Collector. For manual SDK instrumentation across Go, Node.js, and Python — including sampling strategies and context propagation — see OpenTelemetry Instrumentation Guide. For Collector architecture in depth (DaemonSet vs Deployment, multi-backend routing, tail sampling deployment patterns), see OpenTelemetry Collector: Unified Telemetry Pipeline for Kubernetes.
Instrumenting a microservices platform on Kubernetes with OpenTelemetry? Talk to us at Coding Protocols — we help platform teams design observability pipelines that give developers distributed traces without requiring per-service code changes.


