OpenTelemetry Instrumentation: Traces, Metrics, and Logs Without Vendor Lock-In
OpenTelemetry is the standard for telemetry data collection — traces, metrics, and logs through a single SDK and wire protocol. Here's how to instrument applications, configure the Collector pipeline, and route signals to any observability backend without changing application code.

The observability landscape before OpenTelemetry was fragmented. Each vendor had its own SDK, its own agent, its own wire protocol. Migrating from Datadog to Grafana or adding a new backend meant re-instrumenting applications. Application developers made instrumentation decisions based on which backend the ops team happened to use.
OpenTelemetry is the CNCF project that standardised this: one SDK, one wire protocol (OTLP), one Collector. Instrument your application once and route telemetry to any backend — Jaeger, Grafana Tempo, Zipkin, Datadog, Honeycomb — by reconfiguring the Collector, not the application.
This post covers the full OTel stack: auto-instrumentation, manual SDK instrumentation, Collector configuration, sampling, and backend routing.
The OpenTelemetry Stack
Application
↓ (OTLP or SDK)
OTel Collector (receive → process → export)
↓ (OTLP, Jaeger, Prometheus, etc.)
Backends: Grafana Tempo (traces) + Prometheus (metrics) + Loki (logs)
SDK: Language-specific library that captures traces and metrics from application code.
Auto-instrumentation: Instruments libraries (HTTP clients, database drivers, gRPC) automatically without application code changes.
Collector: A standalone agent/gateway that receives telemetry, processes it (sampling, enrichment, filtering), and exports to backends. Runs as a sidecar or DaemonSet in Kubernetes.
OTLP: OpenTelemetry Protocol — the wire format for sending telemetry between SDK → Collector → backend. Over gRPC or HTTP.
Auto-Instrumentation
The fastest way to get traces without code changes: zero-code auto-instrumentation injects the OTel SDK into your application at startup via a Kubernetes operator.
OpenTelemetry Operator
1# Install the OTel Operator
2kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
3
4# Or via Helm
5helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
6helm upgrade --install opentelemetry-operator \
7 open-telemetry/opentelemetry-operator \
8 --namespace observability \
9 --set "manager.collectorImage.repository=otel/opentelemetry-collector-contrib"Instrumentation CR
The Instrumentation CRD tells the operator how to auto-instrument applications in a namespace:
1apiVersion: opentelemetry.io/v1alpha1
2kind: Instrumentation
3metadata:
4 name: auto-instrumentation
5 namespace: production
6spec:
7 exporter:
8 endpoint: http://otel-collector.observability.svc.cluster.local:4317
9
10 propagators:
11 - tracecontext # W3C TraceContext (standard)
12 - baggage # Baggage propagation for cross-service metadata
13 - b3 # B3 (Zipkin compatibility)
14
15 sampler:
16 type: parentbased_traceidratio
17 argument: "0.1" # 10% sampling
18
19 java:
20 image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
21 env:
22 - name: OTEL_INSTRUMENTATION_JDBC_ENABLED
23 value: "true"
24
25 nodejs:
26 image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:latest
27
28 python:
29 image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest
30
31 go:
32 image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-go:latest
33 # Go auto-instrumentation uses eBPF — requires privileged accessEnabling Auto-Instrumentation Per Pod
Add an annotation to the Deployment (not the pod spec):
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: api
5 namespace: production
6 annotations:
7 instrumentation.opentelemetry.io/inject-java: "true"
8 # or: inject-nodejs, inject-python, inject-goThe operator injects an init container that downloads the OTel agent into a shared volume. The main container starts with the agent attached — all HTTP, database, and gRPC calls are automatically traced.
What Gets Auto-Instrumented
For Java (via javaagent):
- Spring MVC, Spring WebFlux, Quarkus, Micronaut HTTP requests
- JDBC, MongoDB, Redis, Elasticsearch, Cassandra
- gRPC client and server
- Kafka producer and consumer
- AWS SDK v1/v2
For Node.js:
- Express, Fastify, NestJS, Koa HTTP routes
pg,mysql2,mongodb,redis,ioredisdatabase clients- gRPC,
axios,node-fetchHTTP clients - Kafka (
kafkajs)
For Python:
- Django, Flask, FastAPI, aiohttp
- SQLAlchemy, psycopg2, pymongo, redis
requests,aiohttpHTTP clients- gRPC
Manual SDK Instrumentation
Auto-instrumentation captures framework-level spans. For business-logic-level tracing (what happened inside a specific handler), you need manual instrumentation:
Go
1package main
2
3import (
4 "context"
5 "go.opentelemetry.io/otel"
6 "go.opentelemetry.io/otel/attribute"
7 "go.opentelemetry.io/otel/codes"
8 "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
9 "go.opentelemetry.io/otel/sdk/resource"
10 sdktrace "go.opentelemetry.io/otel/sdk/trace"
11 semconv "go.opentelemetry.io/otel/semconv/v1.21.0"
12 "go.opentelemetry.io/otel/trace"
13)
14
15var tracer trace.Tracer
16
17func initTracer(ctx context.Context) (*sdktrace.TracerProvider, error) {
18 exporter, err := otlptracegrpc.New(ctx,
19 otlptracegrpc.WithEndpoint("otel-collector:4317"),
20 otlptracegrpc.WithInsecure(), // WithInsecure() is deprecated. Use: grpc.WithTransportCredentials(insecure.NewCredentials())
21 )
22 if err != nil {
23 return nil, err
24 }
25
26 res, _ := resource.New(ctx,
27 resource.WithAttributes(
28 semconv.ServiceName("payment-service"),
29 semconv.ServiceVersion("v2.1.0"),
30 semconv.DeploymentEnvironment("production"),
31 ),
32 )
33
34 tp := sdktrace.NewTracerProvider(
35 sdktrace.WithBatcher(exporter),
36 sdktrace.WithResource(res),
37 sdktrace.WithSampler(sdktrace.ParentBased(
38 sdktrace.TraceIDRatioBased(0.1), // 10% sampling
39 )),
40 )
41 otel.SetTracerProvider(tp)
42 tracer = otel.Tracer("payment-service")
43 return tp, nil
44}
45
46// Instrument a function with a span
47func processPayment(ctx context.Context, amount float64, currency string) error {
48 ctx, span := tracer.Start(ctx, "processPayment",
49 trace.WithAttributes(
50 attribute.Float64("payment.amount", amount),
51 attribute.String("payment.currency", currency),
52 ),
53 )
54 defer span.End()
55
56 // Add events to the span
57 span.AddEvent("validating payment")
58
59 if err := validatePayment(ctx, amount, currency); err != nil {
60 span.RecordError(err)
61 span.SetStatus(codes.Error, err.Error())
62 return err
63 }
64
65 span.AddEvent("charging card")
66 if err := chargeCard(ctx, amount, currency); err != nil {
67 span.RecordError(err)
68 span.SetStatus(codes.Error, err.Error())
69 return err
70 }
71
72 return nil
73}Node.js (TypeScript)
1import { NodeSDK } from '@opentelemetry/sdk-node';
2import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';
3import { Resource } from '@opentelemetry/resources';
4import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
5// SemanticResourceAttributes is deprecated in v1.21+. Use ATTR_SERVICE_NAME directly in newer SDK versions.
6import { trace, SpanStatusCode, context } from '@opentelemetry/api';
7import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
8
9// Initialize SDK (call before any other imports)
10const sdk = new NodeSDK({
11 resource: new Resource({
12 [SemanticResourceAttributes.SERVICE_NAME]: 'api-service',
13 [SemanticResourceAttributes.SERVICE_VERSION]: process.env.APP_VERSION ?? 'unknown',
14 [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.NODE_ENV ?? 'development',
15 }),
16 spanProcessor: new BatchSpanProcessor(
17 new OTLPTraceExporter({
18 url: 'http://otel-collector:4317',
19 })
20 ),
21});
22
23sdk.start();
24process.on('SIGTERM', () => sdk.shutdown());
25
26// Use the tracer in application code
27const tracer = trace.getTracer('api-service');
28
29async function processOrder(orderId: string): Promise<void> {
30 const span = tracer.startSpan('processOrder', {
31 attributes: { 'order.id': orderId },
32 });
33
34 const ctx = trace.setSpan(context.active(), span);
35
36 try {
37 await context.with(ctx, async () => {
38 await validateInventory(orderId); // Child span if validateInventory creates one
39 await chargeCustomer(orderId);
40 await fulfillOrder(orderId);
41 });
42 span.setStatus({ code: SpanStatusCode.OK });
43 } catch (err) {
44 span.recordException(err as Error);
45 span.setStatus({ code: SpanStatusCode.ERROR, message: (err as Error).message });
46 throw err;
47 } finally {
48 span.end();
49 }
50}OpenTelemetry Collector Configuration
The Collector is the routing layer — it receives telemetry from applications and exports to backends:
1# otel-collector-config.yaml
2apiVersion: opentelemetry.io/v1alpha1
3kind: OpenTelemetryCollector
4metadata:
5 name: otel-collector
6 namespace: observability
7spec:
8 mode: DaemonSet # One Collector per node, or Deployment for gateway mode
9 config: |
10 receivers:
11 otlp:
12 protocols:
13 grpc:
14 endpoint: 0.0.0.0:4317
15 http:
16 endpoint: 0.0.0.0:4318
17 # Also receive Prometheus metrics (for legacy scrapers)
18 prometheus:
19 config:
20 scrape_configs:
21 - job_name: 'otel-collector'
22 static_configs:
23 - targets: ['0.0.0.0:8888']
24
25 processors:
26 # Add Kubernetes metadata to all telemetry
27 k8sattributes:
28 auth_type: "serviceAccount"
29 passthrough: false
30 extract:
31 metadata:
32 - k8s.namespace.name
33 - k8s.pod.name
34 - k8s.node.name
35 - k8s.deployment.name
36 labels:
37 - tag_name: app
38 key: app
39 from: pod
40 annotations:
41 - tag_name: version
42 key: app.kubernetes.io/version
43 from: pod
44
45 # Memory limiter — prevent OOM on burst
46 memory_limiter:
47 check_interval: 1s
48 limit_percentage: 75
49 spike_limit_percentage: 25
50
51 # Batch spans before exporting (improves throughput)
52 batch:
53 send_batch_size: 10000
54 timeout: 10s
55
56 # Tail-based sampling: sample 100% of errors, 10% of successful traces
57 tail_sampling:
58 decision_wait: 10s
59 num_traces: 100000
60 expected_new_traces_per_sec: 1000
61 policies:
62 - name: errors
63 type: status_code
64 status_code: {status_codes: [ERROR]}
65 - name: slow-traces
66 type: latency
67 latency: {threshold_ms: 1000} # 100% of traces >1s
68 - name: 10pct-baseline
69 type: probabilistic
70 probabilistic: {sampling_percentage: 10}
71
72 exporters:
73 # Traces to Grafana Tempo
74 otlp/tempo:
75 endpoint: http://tempo.observability.svc.cluster.local:4317
76 tls:
77 insecure: true
78
79 # Metrics to Prometheus (via remote write)
80 prometheusremotewrite:
81 endpoint: http://kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090/api/v1/write
82
83 # Logs to Loki
84 loki:
85 endpoint: http://loki.observability.svc.cluster.local:3100/loki/api/v1/push
86
87 service:
88 pipelines:
89 traces:
90 receivers: [otlp]
91 processors: [memory_limiter, k8sattributes, tail_sampling, batch]
92 # tail_sampling must come before batch — it needs individual spans to evaluate complete traces
93 exporters: [otlp/tempo]
94 metrics:
95 receivers: [otlp, prometheus]
96 processors: [memory_limiter, k8sattributes, batch]
97 exporters: [prometheusremotewrite]
98 logs:
99 receivers: [otlp]
100 processors: [memory_limiter, k8sattributes, batch]
101 exporters: [loki]Sampling Strategies
Without sampling, a high-traffic service generates millions of spans per minute. Storage and query costs scale with trace volume — you need sampling.
Head-based sampling: Decision made at trace start. Simple and low-overhead. Configured in the SDK:
sdktrace.WithSampler(sdktrace.ParentBased(
sdktrace.TraceIDRatioBased(0.1), // 10% of new traces
))ParentBased means: if the parent span was sampled, continue sampling; if not, don't start sampling mid-trace. This preserves trace completeness across services.
Tail-based sampling: Decision made after the trace is complete. The Collector buffers spans, waits for the trace to complete, then applies policies. More complex but allows sampling based on outcome (sample 100% of errors regardless of volume):
1# In Collector config (shown above):
2tail_sampling:
3 policies:
4 - name: always-sample-errors
5 type: status_code
6 status_code: {status_codes: [ERROR]}
7 - name: sample-slow
8 type: latency
9 latency: {threshold_ms: 500}
10 - name: baseline
11 type: probabilistic
12 probabilistic: {sampling_percentage: 5}The combination: 100% of errors, 100% of slow traces (>500ms), 5% of everything else. This captures all the interesting traces at a fraction of the full volume.
Context Propagation
For distributed tracing to work across services, the trace context must be propagated through HTTP/gRPC headers:
Service A creates trace (traceID: abc123, spanID: def456)
→ HTTP call to Service B
→ W3C traceparent header: "00-abc123...-def456-01"
→ Service B creates child span with same traceID
→ HTTP call to Service C
→ Same traceID propagated
With auto-instrumentation, propagation is automatic. For manual HTTP clients in Go:
1import "go.opentelemetry.io/otel/propagation"
2
3// Inject context into outgoing HTTP request
4req, _ := http.NewRequestWithContext(ctx, "GET", url, nil)
5otel.GetTextMapPropagator().Inject(ctx, propagation.HeaderCarrier(req.Header))
6
7// Extract context from incoming request
8ctx = otel.GetTextMapPropagator().Extract(r.Context(), propagation.HeaderCarrier(r.Header))Frequently Asked Questions
Should I use the OTel Operator or instrument manually?
OTel Operator for Java/Node/Python — auto-instrumentation is production-quality and gives you 80% of the value with no application code changes. Manual for Go (Go auto-instrumentation uses eBPF and is more complex) and for business-logic-level spans that auto-instrumentation can't capture.
What's the overhead of adding OTel to my application?
With 10% head-based sampling and batched export: typically <1ms per sampled request added to application latency, <50MB additional memory for the SDK, <5% CPU increase from span processing. Not measurable at 10% sampling for most services. At 100% sampling for a high-RPS service (>10,000 req/s), overhead becomes significant — use sampling.
How do I correlate traces with logs?
Inject the trace and span IDs into log records:
1// Go: extract trace/span IDs from context and add to log fields
2span := trace.SpanFromContext(ctx)
3spanCtx := span.SpanContext()
4
5logger.Info("processing payment",
6 "traceID", spanCtx.TraceID().String(),
7 "spanID", spanCtx.SpanID().String(),
8 "amount", amount,
9)With these IDs in the log record, Grafana can jump from a Loki log line to the corresponding Tempo trace with one click.
Can I send to multiple backends simultaneously?
Yes, via the Collector's fan-out routing:
1exporters:
2 otlp/tempo:
3 endpoint: tempo:4317
4 otlp/datadog:
5 endpoint: api.datadoghq.com:4317
6
7service:
8 pipelines:
9 traces:
10 exporters: [otlp/tempo, otlp/datadog] # Send to bothThis is a migration path — send to your new backend (Tempo) while keeping the old one (Datadog) running until you're confident, then remove the old exporter.
For the Kubernetes observability stack that Tempo and Loki plug into, see Kubernetes Observability: Prometheus, Grafana, and OpenTelemetry. For migrating from vendor-specific agents to OpenTelemetry, see OpenTelemetry Migration: Replacing Vendor Agents. For a deeper look at Collector deployment patterns in Kubernetes (DaemonSet, Deployment, sidecar), see OpenTelemetry Collector: Unified Telemetry Pipeline for Kubernetes. For the OTel Operator and auto-instrumentation without code changes, see OpenTelemetry on Kubernetes: Collector, Auto-Instrumentation, and the Operator.
Building an observability platform on OpenTelemetry? Talk to us at Coding Protocols — we help platform teams design OTel pipelines that scale to hundreds of services without breaking the observability budget.


