Observability
16 min readMay 4, 2026

OpenTelemetry Instrumentation: Traces, Metrics, and Logs Without Vendor Lock-In

OpenTelemetry is the standard for telemetry data collection — traces, metrics, and logs through a single SDK and wire protocol. Here's how to instrument applications, configure the Collector pipeline, and route signals to any observability backend without changing application code.

CO
Coding Protocols Team
Platform Engineering
OpenTelemetry Instrumentation: Traces, Metrics, and Logs Without Vendor Lock-In

The observability landscape before OpenTelemetry was fragmented. Each vendor had its own SDK, its own agent, its own wire protocol. Migrating from Datadog to Grafana or adding a new backend meant re-instrumenting applications. Application developers made instrumentation decisions based on which backend the ops team happened to use.

OpenTelemetry is the CNCF project that standardised this: one SDK, one wire protocol (OTLP), one Collector. Instrument your application once and route telemetry to any backend — Jaeger, Grafana Tempo, Zipkin, Datadog, Honeycomb — by reconfiguring the Collector, not the application.

This post covers the full OTel stack: auto-instrumentation, manual SDK instrumentation, Collector configuration, sampling, and backend routing.


The OpenTelemetry Stack

Application
    ↓ (OTLP or SDK)
OTel Collector (receive → process → export)
    ↓ (OTLP, Jaeger, Prometheus, etc.)
Backends: Grafana Tempo (traces) + Prometheus (metrics) + Loki (logs)

SDK: Language-specific library that captures traces and metrics from application code.

Auto-instrumentation: Instruments libraries (HTTP clients, database drivers, gRPC) automatically without application code changes.

Collector: A standalone agent/gateway that receives telemetry, processes it (sampling, enrichment, filtering), and exports to backends. Runs as a sidecar or DaemonSet in Kubernetes.

OTLP: OpenTelemetry Protocol — the wire format for sending telemetry between SDK → Collector → backend. Over gRPC or HTTP.


Auto-Instrumentation

The fastest way to get traces without code changes: zero-code auto-instrumentation injects the OTel SDK into your application at startup via a Kubernetes operator.

OpenTelemetry Operator

bash
1# Install the OTel Operator
2kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
3
4# Or via Helm
5helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
6helm upgrade --install opentelemetry-operator \
7  open-telemetry/opentelemetry-operator \
8  --namespace observability \
9  --set "manager.collectorImage.repository=otel/opentelemetry-collector-contrib"

Instrumentation CR

The Instrumentation CRD tells the operator how to auto-instrument applications in a namespace:

yaml
1apiVersion: opentelemetry.io/v1alpha1
2kind: Instrumentation
3metadata:
4  name: auto-instrumentation
5  namespace: production
6spec:
7  exporter:
8    endpoint: http://otel-collector.observability.svc.cluster.local:4317
9
10  propagators:
11    - tracecontext    # W3C TraceContext (standard)
12    - baggage         # Baggage propagation for cross-service metadata
13    - b3              # B3 (Zipkin compatibility)
14
15  sampler:
16    type: parentbased_traceidratio
17    argument: "0.1"   # 10% sampling
18
19  java:
20    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
21    env:
22      - name: OTEL_INSTRUMENTATION_JDBC_ENABLED
23        value: "true"
24
25  nodejs:
26    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:latest
27
28  python:
29    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest
30
31  go:
32    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-go:latest
33    # Go auto-instrumentation uses eBPF — requires privileged access

Enabling Auto-Instrumentation Per Pod

Add an annotation to the Deployment (not the pod spec):

yaml
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4  name: api
5  namespace: production
6  annotations:
7    instrumentation.opentelemetry.io/inject-java: "true"
8    # or: inject-nodejs, inject-python, inject-go

The operator injects an init container that downloads the OTel agent into a shared volume. The main container starts with the agent attached — all HTTP, database, and gRPC calls are automatically traced.

What Gets Auto-Instrumented

For Java (via javaagent):

  • Spring MVC, Spring WebFlux, Quarkus, Micronaut HTTP requests
  • JDBC, MongoDB, Redis, Elasticsearch, Cassandra
  • gRPC client and server
  • Kafka producer and consumer
  • AWS SDK v1/v2

For Node.js:

  • Express, Fastify, NestJS, Koa HTTP routes
  • pg, mysql2, mongodb, redis, ioredis database clients
  • gRPC, axios, node-fetch HTTP clients
  • Kafka (kafkajs)

For Python:

  • Django, Flask, FastAPI, aiohttp
  • SQLAlchemy, psycopg2, pymongo, redis
  • requests, aiohttp HTTP clients
  • gRPC

Manual SDK Instrumentation

Auto-instrumentation captures framework-level spans. For business-logic-level tracing (what happened inside a specific handler), you need manual instrumentation:

Go

go
1package main
2
3import (
4    "context"
5    "go.opentelemetry.io/otel"
6    "go.opentelemetry.io/otel/attribute"
7    "go.opentelemetry.io/otel/codes"
8    "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
9    "go.opentelemetry.io/otel/sdk/resource"
10    sdktrace "go.opentelemetry.io/otel/sdk/trace"
11    semconv "go.opentelemetry.io/otel/semconv/v1.21.0"
12    "go.opentelemetry.io/otel/trace"
13)
14
15var tracer trace.Tracer
16
17func initTracer(ctx context.Context) (*sdktrace.TracerProvider, error) {
18    exporter, err := otlptracegrpc.New(ctx,
19        otlptracegrpc.WithEndpoint("otel-collector:4317"),
20        otlptracegrpc.WithInsecure(), // WithInsecure() is deprecated. Use: grpc.WithTransportCredentials(insecure.NewCredentials())
21    )
22    if err != nil {
23        return nil, err
24    }
25
26    res, _ := resource.New(ctx,
27        resource.WithAttributes(
28            semconv.ServiceName("payment-service"),
29            semconv.ServiceVersion("v2.1.0"),
30            semconv.DeploymentEnvironment("production"),
31        ),
32    )
33
34    tp := sdktrace.NewTracerProvider(
35        sdktrace.WithBatcher(exporter),
36        sdktrace.WithResource(res),
37        sdktrace.WithSampler(sdktrace.ParentBased(
38            sdktrace.TraceIDRatioBased(0.1),  // 10% sampling
39        )),
40    )
41    otel.SetTracerProvider(tp)
42    tracer = otel.Tracer("payment-service")
43    return tp, nil
44}
45
46// Instrument a function with a span
47func processPayment(ctx context.Context, amount float64, currency string) error {
48    ctx, span := tracer.Start(ctx, "processPayment",
49        trace.WithAttributes(
50            attribute.Float64("payment.amount", amount),
51            attribute.String("payment.currency", currency),
52        ),
53    )
54    defer span.End()
55
56    // Add events to the span
57    span.AddEvent("validating payment")
58
59    if err := validatePayment(ctx, amount, currency); err != nil {
60        span.RecordError(err)
61        span.SetStatus(codes.Error, err.Error())
62        return err
63    }
64
65    span.AddEvent("charging card")
66    if err := chargeCard(ctx, amount, currency); err != nil {
67        span.RecordError(err)
68        span.SetStatus(codes.Error, err.Error())
69        return err
70    }
71
72    return nil
73}

Node.js (TypeScript)

typescript
1import { NodeSDK } from '@opentelemetry/sdk-node';
2import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';
3import { Resource } from '@opentelemetry/resources';
4import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
5// SemanticResourceAttributes is deprecated in v1.21+. Use ATTR_SERVICE_NAME directly in newer SDK versions.
6import { trace, SpanStatusCode, context } from '@opentelemetry/api';
7import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
8
9// Initialize SDK (call before any other imports)
10const sdk = new NodeSDK({
11  resource: new Resource({
12    [SemanticResourceAttributes.SERVICE_NAME]: 'api-service',
13    [SemanticResourceAttributes.SERVICE_VERSION]: process.env.APP_VERSION ?? 'unknown',
14    [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.NODE_ENV ?? 'development',
15  }),
16  spanProcessor: new BatchSpanProcessor(
17    new OTLPTraceExporter({
18      url: 'http://otel-collector:4317',
19    })
20  ),
21});
22
23sdk.start();
24process.on('SIGTERM', () => sdk.shutdown());
25
26// Use the tracer in application code
27const tracer = trace.getTracer('api-service');
28
29async function processOrder(orderId: string): Promise<void> {
30  const span = tracer.startSpan('processOrder', {
31    attributes: { 'order.id': orderId },
32  });
33
34  const ctx = trace.setSpan(context.active(), span);
35
36  try {
37    await context.with(ctx, async () => {
38      await validateInventory(orderId);  // Child span if validateInventory creates one
39      await chargeCustomer(orderId);
40      await fulfillOrder(orderId);
41    });
42    span.setStatus({ code: SpanStatusCode.OK });
43  } catch (err) {
44    span.recordException(err as Error);
45    span.setStatus({ code: SpanStatusCode.ERROR, message: (err as Error).message });
46    throw err;
47  } finally {
48    span.end();
49  }
50}

OpenTelemetry Collector Configuration

The Collector is the routing layer — it receives telemetry from applications and exports to backends:

yaml
1# otel-collector-config.yaml
2apiVersion: opentelemetry.io/v1alpha1
3kind: OpenTelemetryCollector
4metadata:
5  name: otel-collector
6  namespace: observability
7spec:
8  mode: DaemonSet   # One Collector per node, or Deployment for gateway mode
9  config: |
10    receivers:
11      otlp:
12        protocols:
13          grpc:
14            endpoint: 0.0.0.0:4317
15          http:
16            endpoint: 0.0.0.0:4318
17      # Also receive Prometheus metrics (for legacy scrapers)
18      prometheus:
19        config:
20          scrape_configs:
21            - job_name: 'otel-collector'
22              static_configs:
23                - targets: ['0.0.0.0:8888']
24
25    processors:
26      # Add Kubernetes metadata to all telemetry
27      k8sattributes:
28        auth_type: "serviceAccount"
29        passthrough: false
30        extract:
31          metadata:
32            - k8s.namespace.name
33            - k8s.pod.name
34            - k8s.node.name
35            - k8s.deployment.name
36          labels:
37            - tag_name: app
38              key: app
39              from: pod
40          annotations:
41            - tag_name: version
42              key: app.kubernetes.io/version
43              from: pod
44
45      # Memory limiter — prevent OOM on burst
46      memory_limiter:
47        check_interval: 1s
48        limit_percentage: 75
49        spike_limit_percentage: 25
50
51      # Batch spans before exporting (improves throughput)
52      batch:
53        send_batch_size: 10000
54        timeout: 10s
55
56      # Tail-based sampling: sample 100% of errors, 10% of successful traces
57      tail_sampling:
58        decision_wait: 10s
59        num_traces: 100000
60        expected_new_traces_per_sec: 1000
61        policies:
62          - name: errors
63            type: status_code
64            status_code: {status_codes: [ERROR]}
65          - name: slow-traces
66            type: latency
67            latency: {threshold_ms: 1000}  # 100% of traces >1s
68          - name: 10pct-baseline
69            type: probabilistic
70            probabilistic: {sampling_percentage: 10}
71
72    exporters:
73      # Traces to Grafana Tempo
74      otlp/tempo:
75        endpoint: http://tempo.observability.svc.cluster.local:4317
76        tls:
77          insecure: true
78
79      # Metrics to Prometheus (via remote write)
80      prometheusremotewrite:
81        endpoint: http://kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090/api/v1/write
82
83      # Logs to Loki
84      loki:
85        endpoint: http://loki.observability.svc.cluster.local:3100/loki/api/v1/push
86
87    service:
88      pipelines:
89        traces:
90          receivers: [otlp]
91          processors: [memory_limiter, k8sattributes, tail_sampling, batch]
92          # tail_sampling must come before batch — it needs individual spans to evaluate complete traces
93          exporters: [otlp/tempo]
94        metrics:
95          receivers: [otlp, prometheus]
96          processors: [memory_limiter, k8sattributes, batch]
97          exporters: [prometheusremotewrite]
98        logs:
99          receivers: [otlp]
100          processors: [memory_limiter, k8sattributes, batch]
101          exporters: [loki]

Sampling Strategies

Without sampling, a high-traffic service generates millions of spans per minute. Storage and query costs scale with trace volume — you need sampling.

Head-based sampling: Decision made at trace start. Simple and low-overhead. Configured in the SDK:

go
sdktrace.WithSampler(sdktrace.ParentBased(
    sdktrace.TraceIDRatioBased(0.1),  // 10% of new traces
))

ParentBased means: if the parent span was sampled, continue sampling; if not, don't start sampling mid-trace. This preserves trace completeness across services.

Tail-based sampling: Decision made after the trace is complete. The Collector buffers spans, waits for the trace to complete, then applies policies. More complex but allows sampling based on outcome (sample 100% of errors regardless of volume):

yaml
1# In Collector config (shown above):
2tail_sampling:
3  policies:
4    - name: always-sample-errors
5      type: status_code
6      status_code: {status_codes: [ERROR]}
7    - name: sample-slow
8      type: latency
9      latency: {threshold_ms: 500}
10    - name: baseline
11      type: probabilistic
12      probabilistic: {sampling_percentage: 5}

The combination: 100% of errors, 100% of slow traces (>500ms), 5% of everything else. This captures all the interesting traces at a fraction of the full volume.


Context Propagation

For distributed tracing to work across services, the trace context must be propagated through HTTP/gRPC headers:

Service A creates trace (traceID: abc123, spanID: def456)
  → HTTP call to Service B
  → W3C traceparent header: "00-abc123...-def456-01"
  → Service B creates child span with same traceID
  → HTTP call to Service C
  → Same traceID propagated

With auto-instrumentation, propagation is automatic. For manual HTTP clients in Go:

go
1import "go.opentelemetry.io/otel/propagation"
2
3// Inject context into outgoing HTTP request
4req, _ := http.NewRequestWithContext(ctx, "GET", url, nil)
5otel.GetTextMapPropagator().Inject(ctx, propagation.HeaderCarrier(req.Header))
6
7// Extract context from incoming request
8ctx = otel.GetTextMapPropagator().Extract(r.Context(), propagation.HeaderCarrier(r.Header))

Frequently Asked Questions

Should I use the OTel Operator or instrument manually?

OTel Operator for Java/Node/Python — auto-instrumentation is production-quality and gives you 80% of the value with no application code changes. Manual for Go (Go auto-instrumentation uses eBPF and is more complex) and for business-logic-level spans that auto-instrumentation can't capture.

What's the overhead of adding OTel to my application?

With 10% head-based sampling and batched export: typically <1ms per sampled request added to application latency, <50MB additional memory for the SDK, <5% CPU increase from span processing. Not measurable at 10% sampling for most services. At 100% sampling for a high-RPS service (>10,000 req/s), overhead becomes significant — use sampling.

How do I correlate traces with logs?

Inject the trace and span IDs into log records:

go
1// Go: extract trace/span IDs from context and add to log fields
2span := trace.SpanFromContext(ctx)
3spanCtx := span.SpanContext()
4
5logger.Info("processing payment",
6    "traceID", spanCtx.TraceID().String(),
7    "spanID", spanCtx.SpanID().String(),
8    "amount", amount,
9)

With these IDs in the log record, Grafana can jump from a Loki log line to the corresponding Tempo trace with one click.

Can I send to multiple backends simultaneously?

Yes, via the Collector's fan-out routing:

yaml
1exporters:
2  otlp/tempo:
3    endpoint: tempo:4317
4  otlp/datadog:
5    endpoint: api.datadoghq.com:4317
6
7service:
8  pipelines:
9    traces:
10      exporters: [otlp/tempo, otlp/datadog]  # Send to both

This is a migration path — send to your new backend (Tempo) while keeping the old one (Datadog) running until you're confident, then remove the old exporter.


For the Kubernetes observability stack that Tempo and Loki plug into, see Kubernetes Observability: Prometheus, Grafana, and OpenTelemetry. For migrating from vendor-specific agents to OpenTelemetry, see OpenTelemetry Migration: Replacing Vendor Agents. For a deeper look at Collector deployment patterns in Kubernetes (DaemonSet, Deployment, sidecar), see OpenTelemetry Collector: Unified Telemetry Pipeline for Kubernetes. For the OTel Operator and auto-instrumentation without code changes, see OpenTelemetry on Kubernetes: Collector, Auto-Instrumentation, and the Operator.

Building an observability platform on OpenTelemetry? Talk to us at Coding Protocols — we help platform teams design OTel pipelines that scale to hundreds of services without breaking the observability budget.

Related Topics

OpenTelemetry
Observability
Distributed Tracing
Metrics
Logs
Platform Engineering
Kubernetes
Grafana Tempo
Jaeger

Read Next