Observability
12 min readMay 4, 2026

OpenTelemetry on Kubernetes: Collector, Auto-Instrumentation, and the Operator

OpenTelemetry (OTel) is the CNCF standard for collecting telemetry — traces, metrics, and logs — from Kubernetes workloads. The OpenTelemetry Operator automates two things: deploying and configuring the Collector as a sidecar or DaemonSet, and injecting auto-instrumentation into application pods without code changes. This covers the production setup: Collector as DaemonSet with OTLP export, auto-instrumentation for Java/Python/Node.js, and the pipelines that send data to your backend.

CO
Coding Protocols Team
Platform Engineering
OpenTelemetry on Kubernetes: Collector, Auto-Instrumentation, and the Operator

OpenTelemetry isn't a backend — it's the pipeline between your application and your observability backend. The OTel Collector receives telemetry (over OTLP, Jaeger, Zipkin, or Prometheus), processes it (batching, sampling, enriching), and exports it to wherever you're storing it (Tempo, Jaeger, Honeycomb, Datadog, AWS X-Ray).

On Kubernetes, the OpenTelemetry Operator manages this entire stack: it deploys Collectors, injects auto-instrumentation agents, and handles the configuration lifecycle. The result: applications emit OTLP telemetry without code changes, the Collector enriches spans with Kubernetes metadata, and the backend gets clean, consistently-labeled data.


OpenTelemetry Operator Installation

bash
1helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
2helm repo update
3
4helm install opentelemetry-operator open-telemetry/opentelemetry-operator \
5  --namespace opentelemetry-operator-system \
6  --create-namespace \
7  --version 0.78.0 \
8  --set admissionWebhooks.certManager.enabled=true    # Uses cert-manager for webhook TLS

The Operator requires cert-manager for its admission webhook certificates. Alternatively, admissionWebhooks.autoGenerateCert.enabled=true uses a self-signed cert (simpler but less robust).


OpenTelemetryCollector: DaemonSet Mode

A DaemonSet Collector runs one pod per node — it collects telemetry from all applications on that node and forwards it to a backend. This is the most common production pattern:

yaml
1apiVersion: opentelemetry.io/v1beta1
2kind: OpenTelemetryCollector
3metadata:
4  name: otel-collector
5  namespace: opentelemetry-operator-system
6spec:
7  mode: daemonset    # daemonset | deployment | sidecar | statefulset
8
9  resources:
10    limits:
11      cpu: 500m
12      memory: 512Mi
13    requests:
14      cpu: 100m
15      memory: 128Mi
16
17  config: |
18    receivers:
19      otlp:
20        protocols:
21          grpc:
22            endpoint: 0.0.0.0:4317
23          http:
24            endpoint: 0.0.0.0:4318
25      # Collect host metrics from the node
26      hostmetrics:
27        collection_interval: 30s
28        scrapers:
29          cpu: {}
30          disk: {}
31          filesystem: {}
32          memory: {}
33          network: {}
34
35    processors:
36      # Batch spans before exporting — reduces export calls
37      batch:
38        timeout: 5s
39        send_batch_size: 512
40
41      # Enrich spans with Kubernetes metadata from the pod's environment
42      k8sattributes:
43        auth_type: serviceAccount
44        passthrough: false
45        extract:
46          metadata:
47            - k8s.namespace.name
48            - k8s.pod.name
49            - k8s.pod.uid
50            - k8s.deployment.name
51            - k8s.node.name
52            - k8s.container.name
53          labels:
54            - tag_name: app
55              key: app
56              from: pod
57          annotations:
58            - tag_name: version
59              key: app.kubernetes.io/version
60              from: pod
61        pod_association:
62          - sources:
63              - from: resource_attribute
64                name: k8s.pod.ip
65          - sources:
66              - from: connection
67
68      # Memory limiter — prevent OOM in the Collector itself
69      memory_limiter:
70        check_interval: 1s
71        limit_percentage: 75
72        spike_limit_percentage: 30
73
74    exporters:
75      # Export traces to Grafana Tempo
76      otlp/tempo:
77        endpoint: tempo.monitoring.svc.cluster.local:4317
78        tls:
79          insecure: true    # In-cluster; use TLS for cross-cluster
80
81      # Export metrics to Prometheus (remote write or push gateway)
82      prometheusremotewrite:
83        endpoint: http://kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090/api/v1/write
84
85      # Debug: log to stdout (remove in production)
86      # debug:
87      #   verbosity: detailed
88
89    service:
90      pipelines:
91        traces:
92          receivers: [otlp]
93          processors: [memory_limiter, k8sattributes, batch]
94          exporters: [otlp/tempo]
95        metrics:
96          receivers: [otlp, hostmetrics]
97          processors: [memory_limiter, k8sattributes, batch]
98          exporters: [prometheusremotewrite]

The k8sattributes processor queries the Kubernetes API to enrich every span with the pod's namespace, deployment name, and node — the metadata you need to filter traces by service in Grafana.


Instrumentation: Auto-Injection

The Instrumentation CRD configures which auto-instrumentation library to inject:

yaml
1apiVersion: opentelemetry.io/v1beta1
2kind: Instrumentation
3metadata:
4  name: otel-instrumentation
5  namespace: payments
6spec:
7  exporter:
8    endpoint: http://otel-collector-collector.opentelemetry-operator-system.svc.cluster.local:4318
9
10  propagators:
11    - tracecontext    # W3C Trace Context (standard)
12    - baggage         # W3C Baggage
13
14  sampler:
15    type: parentbased_traceidratio
16    argument: "0.1"    # 10% sampling — adjust per traffic volume
17
18  java:
19    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:2.9.0
20    env:
21      - name: OTEL_INSTRUMENTATION_JDBC_ENABLED
22        value: "true"
23      - name: OTEL_INSTRUMENTATION_SPRING_WEB_ENABLED
24        value: "true"
25
26  python:
27    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.49b0
28    env:
29      - name: OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED
30        value: "true"
31
32  nodejs:
33    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.55.0
34
35  go:
36    image: ghcr.io/open-telemetry/opentelemetry-go-instrumentation/autoinstrumentation-go:0.11.0-alpha
37    env:
38      - name: OTEL_GO_AUTO_TARGET_EXE
39        value: "/app/server"    # The path to your Go binary

Go Auto-Instrumentation (eBPF)

Go is unique: because it is statically compiled, it traditionally required manual instrumentation. In 2026, the OTel eBPF agent solves this by using eBPF to trace Go function calls at the kernel level:

yaml
metadata:
  annotations:
    instrumentation.opentelemetry.io/inject-go: "payments/otel-instrumentation"
    instrumentation.opentelemetry.io/otel-go-auto-target-exe: "/app/payments-api"

The agent runs as a privileged sidecar that attaches to your Go binary, producing OTLP spans for HTTP, gRPC, and database calls without a single line of tracing code in your Go source.


Enabling Auto-Instrumentation on a Pod

Add the annotation to the Deployment's pod template — no code changes needed:

yaml
1# payments-api/deployment.yaml
2spec:
3  template:
4    metadata:
5      annotations:
6        instrumentation.opentelemetry.io/inject-java: "payments/otel-instrumentation"
7        # Or for Python: instrumentation.opentelemetry.io/inject-python: "payments/otel-instrumentation"
8        # Or for Node.js: instrumentation.opentelemetry.io/inject-nodejs: "payments/otel-instrumentation"

The Operator's mutating webhook injects an init container that copies the agent library, then sets JAVA_TOOL_OPTIONS (or equivalent) to load it. The application starts with OTel auto-instrumentation active — no JAR changes, no code modifications.


Sidecar Mode for High-Volume Services

For services that generate a lot of telemetry, a sidecar Collector co-located with the pod avoids the DaemonSet becoming a bottleneck:

yaml
1apiVersion: opentelemetry.io/v1beta1
2kind: OpenTelemetryCollector
3metadata:
4  name: payments-sidecar
5  namespace: payments
6spec:
7  mode: sidecar
8
9  config: |
10    receivers:
11      otlp:
12        protocols:
13          grpc:
14            endpoint: localhost:4317    # Sidecar: bind to localhost only
15
16    processors:
17      batch:
18        timeout: 2s
19        send_batch_size: 256
20
21    exporters:
22      otlp:
23        endpoint: otel-collector-collector.opentelemetry-operator-system.svc.cluster.local:4317
24        tls:
25          insecure: true
26
27    service:
28      pipelines:
29        traces:
30          receivers: [otlp]
31          processors: [batch]
32          exporters: [otlp]

Enable the sidecar on the Deployment:

yaml
metadata:
  annotations:
    sidecar.opentelemetry.io/inject: "payments-sidecar"

The sidecar Collector receives OTLP from the application on localhost:4317 and forwards to the DaemonSet Collector. The application sends to localhost:4317 — no DNS resolution needed.


Tail Sampling: Keeping Only Interesting Traces

With 10% head-based sampling, you miss 90% of failing requests. Tail sampling evaluates the complete trace before deciding to keep it:

yaml
1processors:
2  tail_sampling:
3    decision_wait: 10s      # Wait 10s for all spans before making sampling decision
4    num_traces: 100000      # Max traces in memory waiting for a decision
5    expected_new_traces_per_sec: 1000
6
7    policies:
8      # Always keep error traces
9      - name: errors-policy
10        type: status_code
11        status_code:
12          status_codes: [ERROR]
13
14      # Always keep slow traces (P99 > 1s)
15      - name: slow-traces-policy
16        type: latency
17        latency:
18          threshold_ms: 1000
19
20      # Sample 1% of successful, fast traces
21      - name: probabilistic-policy
22        type: probabilistic
23        probabilistic:
24          sampling_percentage: 1

Tail sampling requires the Collector to be in deployment mode (not DaemonSet) — all spans from a trace must arrive at the same Collector instance for tail sampling to work. This typically means a two-tier Collector setup: DaemonSet Collectors on each node forward to a central deployment Collector that runs tail sampling.


Frequently Asked Questions

What's the difference between OTLP/gRPC (port 4317) and OTLP/HTTP (port 4318)?

OTLP/gRPC (4317) uses HTTP/2 multiplexing and is more efficient for high-volume, server-side telemetry. OTLP/HTTP (4318) uses HTTP/1.1 (or HTTP/2) and is more compatible with browser clients and environments where gRPC is blocked (some corporate proxies). For Kubernetes workloads, prefer gRPC. For browser-side telemetry (RUM), use HTTP.

Does auto-instrumentation work with all frameworks?

Java auto-instrumentation covers 150+ frameworks (Spring, Micronaut, Quarkus, JDBC, gRPC, Kafka, Redis, etc.). Python covers Django, Flask, FastAPI, SQLAlchemy, gRPC, and others. Node.js covers Express, Fastify, HTTP, gRPC, and database clients. Unsupported frameworks need manual SDK instrumentation. The agent won't break unsupported frameworks — it just won't produce spans for them.

How do I debug why spans aren't appearing?

  1. Check the Collector is receiving: kubectl logs -n opentelemetry-operator-system daemonset/otel-collector-collector | grep -i error
  2. Check auto-instrumentation injected: kubectl describe pod <pod> -n payments | grep -i OTEL — should see OTEL_EXPORTER_OTLP_ENDPOINT env var
  3. Check the Instrumentation endpoint matches the Collector service: kubectl get instrumentation otel-instrumentation -n payments -o yaml | grep endpoint
  4. Enable debug logging in the Collector: add debug: exporter and pipe it to the traces pipeline temporarily
  5. Check the k8sattributes processor has the correct RBAC (needs pods read access in the namespace)

For Prometheus metrics that complement OTel traces, see Prometheus and Grafana on Kubernetes: Production Monitoring Stack. For migrating from vendor-specific agents (Datadog, New Relic) to OTel, see OpenTelemetry: Migrating from Vendor Agents to the Collector. For manual SDK instrumentation across Go, Node.js, and Python — including sampling strategies and context propagation — see OpenTelemetry Instrumentation Guide. For Collector architecture in depth (DaemonSet vs Deployment, multi-backend routing, tail sampling deployment patterns), see OpenTelemetry Collector: Unified Telemetry Pipeline for Kubernetes.

Instrumenting a microservices platform on Kubernetes with OpenTelemetry? Talk to us at Coding Protocols — we help platform teams design observability pipelines that give developers distributed traces without requiring per-service code changes.

Related Topics

OpenTelemetry
Kubernetes
Observability
Tracing
Metrics
Logs
OTEL
Platform Engineering

Read Next