Distributed Request Tracing with OpenTelemetry
Instrument a Node.js service with OpenTelemetry, send traces to Jaeger or Tempo, and see exactly which service caused a latency spike. No vendor lock-in, portable signals.
Before you begin
- A Node.js application (Express or similar)
- A Kubernetes cluster with kubectl access
- Helm 3 installed
- Basic understanding of microservices
When a request takes 2 seconds and spans five microservices, logs tell you something went wrong. Traces tell you exactly which service, which database call, and which line of code caused the delay.
OpenTelemetry is the vendor-neutral standard for generating traces, metrics, and logs. This tutorial instruments a Node.js service, deploys Jaeger, and shows you how to read a trace.
How Distributed Tracing Works
A trace is a collection of spans. Each span represents one unit of work — an HTTP request, a database query, a function call. Spans form a tree: a root span (the incoming request) has child spans (downstream calls).
Every span carries a trace-id that's the same across all services. When service A calls service B, it passes the trace-id in a header (traceparent). Service B creates a child span under the same trace. After the request, all spans are collected and reassembled into the tree.
HTTP request → Service A [root span, 450ms]
└─ DB query [child, 20ms]
└─ gRPC call → Service B [child, 380ms]
└─ HTTP call → Service C [child, 350ms] ← this is slow
└─ DB query [child, 340ms] ← this is the root cause
Part 1: Deploy Jaeger
helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
helm repo update
# All-in-one Jaeger (development — in-memory storage)
helm install jaeger jaegertracing/jaeger \
--namespace observability \
--create-namespace \
--set allInOne.enabled=true \
--set collector.enabled=false \
--set query.enabled=false \
--set agent.enabled=false \
--set storage.type=memory
kubectl wait --for=condition=Ready pod -l app.kubernetes.io/name=jaeger \
-n observability --timeout=60s
For production, use Elasticsearch or Cassandra as the storage backend. The in-memory all-in-one is fine for development and this tutorial.
Access the Jaeger UI:
kubectl port-forward svc/jaeger-query 16686:16686 -n observability
Open http://localhost:16686.
The OTLP endpoint (where your services send traces) is:
http://jaeger-collector.observability.svc.cluster.local:4318 (HTTP)
grpc://jaeger-collector.observability.svc.cluster.local:4317 (gRPC)
Part 2: Instrument a Node.js Service
Step 1: Install Dependencies
npm install \
@opentelemetry/sdk-node \
@opentelemetry/auto-instrumentations-node \
@opentelemetry/exporter-trace-otlp-http \
@opentelemetry/resources \
@opentelemetry/semantic-conventions
Step 2: Create the Instrumentation File
Create src/instrumentation.ts — this must load before your application code:
// src/instrumentation.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { Resource } from '@opentelemetry/resources';
import { SEMRESATTRS_SERVICE_NAME, SEMRESATTRS_SERVICE_VERSION } from '@opentelemetry/semantic-conventions';
const exporter = new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://localhost:4318/v1/traces',
});
const sdk = new NodeSDK({
resource: new Resource({
[SEMRESATTRS_SERVICE_NAME]: process.env.OTEL_SERVICE_NAME ?? 'my-service',
[SEMRESATTRS_SERVICE_VERSION]: process.env.APP_VERSION ?? '1.0.0',
}),
traceExporter: exporter,
instrumentations: [
getNodeAutoInstrumentations({
// Auto-instruments: http, express, pg, redis, grpc, and more
'@opentelemetry/instrumentation-fs': { enabled: false }, // noisy, skip
}),
],
});
sdk.start();
// Graceful shutdown
process.on('SIGTERM', () => {
sdk.shutdown().then(() => process.exit(0));
});
Step 3: Load Instrumentation Before Your App
Update your start command in package.json:
{
"scripts": {
"start": "node --require ./dist/instrumentation.js dist/index.js"
}
}
Or with ts-node:
{
"scripts": {
"start": "ts-node --require ./src/instrumentation.ts src/index.ts"
}
}
Auto-instrumentations hook into Node.js's module system at startup. They patch http, express, pg, ioredis, and other popular libraries automatically — you get traces for database queries and HTTP calls without writing a single span manually.
Step 4: Add Custom Spans for Business Logic
Auto-instrumentation covers infrastructure calls. For your business logic, add manual spans:
import { trace, SpanStatusCode } from '@opentelemetry/api';
const tracer = trace.getTracer('my-service');
async function processOrder(orderId: string): Promise<Order> {
return tracer.startActiveSpan('processOrder', async (span) => {
span.setAttribute('order.id', orderId);
span.setAttribute('order.source', 'api');
try {
const order = await db.orders.findById(orderId);
span.setAttribute('order.status', order.status);
if (order.status === 'pending') {
await tracer.startActiveSpan('validatePayment', async (paymentSpan) => {
paymentSpan.setAttribute('payment.method', order.paymentMethod);
await paymentService.validate(order.payment);
paymentSpan.end();
});
}
span.setStatus({ code: SpanStatusCode.OK });
return order;
} catch (error) {
span.setStatus({ code: SpanStatusCode.ERROR, message: (error as Error).message });
span.recordException(error as Error);
throw error;
} finally {
span.end();
}
});
}
Step 5: Deploy with Environment Variables
In your Kubernetes Deployment:
env:
- name: OTEL_SERVICE_NAME
value: "my-service"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://jaeger-collector.observability.svc.cluster.local:4318/v1/traces"
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: "http/protobuf"
- name: NODE_ENV
value: "production"
Part 3: Propagate Traces Across Services
Trace context propagates automatically when you make HTTP calls using Node.js's built-in http module or fetch — the auto-instrumentation injects the traceparent header.
If you're using a custom HTTP client, inject the header manually:
import { context, propagation } from '@opentelemetry/api';
async function callServiceB(data: unknown) {
const headers: Record<string, string> = {
'Content-Type': 'application/json',
};
// Inject current trace context into the request headers
propagation.inject(context.active(), headers);
const response = await fetch('http://service-b/endpoint', {
method: 'POST',
headers,
body: JSON.stringify(data),
});
return response.json();
}
Service B must also be instrumented and extract the context on the incoming request — auto-instrumentation handles this automatically on the receiving end.
Part 4: Read a Trace in Jaeger
Open the Jaeger UI at http://localhost:16686.
- Select your service from the Service dropdown
- Click Find Traces
- Click any trace to open the waterfall view
The waterfall view shows:
- Total duration at the top (e.g., 450ms)
- Spans stacked by start time, width proportional to duration
- Each span shows service name, operation name, duration
- Click a span to see its attributes (HTTP status, DB query, error details)
Look for:
- The widest spans (most time spent)
- Gaps between spans (time in transit or waiting)
- Red spans (errors)
- Spans with
db.statementattribute (the actual SQL query that ran)
Part 5: Add Tempo as a Backend (Grafana Stack)
If you're already running the kube-prometheus-stack, replace Jaeger with Grafana Tempo and view traces inside Grafana:
helm repo add grafana https://grafana.github.io/helm-charts
helm install tempo grafana/tempo \
--namespace observability \
--set tempo.storage.trace.backend=local
# Update OTEL endpoint to point to Tempo
kubectl set env deployment/my-app \
OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo.observability.svc.cluster.local:4318/v1/traces
In Grafana, add a Tempo data source (URL: http://tempo:3100). Then in any Grafana panel, switch to the Explore tab, select the Tempo data source, and paste a trace-id to jump directly to the trace.
Sampling Strategy
In production, don't send 100% of traces — it's expensive. Configure head-based sampling in the SDK:
import { TraceIdRatioBasedSampler } from '@opentelemetry/sdk-trace-base';
const sdk = new NodeSDK({
sampler: new TraceIdRatioBasedSampler(0.1), // Sample 10% of traces
// ... rest of config
});
Or use tail-based sampling in the OpenTelemetry Collector (keeps traces with errors at 100%):
# otel-collector-config.yaml
processors:
tail_sampling:
decision_wait: 10s
policies:
- name: errors
type: status_code
status_code: { status_codes: [ERROR] }
- name: slow-requests
type: latency
latency: { threshold_ms: 1000 }
- name: small-percentage
type: probabilistic
probabilistic: { sampling_percentage: 5 }
Sample 100% of errors and slow requests, 5% of everything else. This keeps the signal-to-noise ratio high.
We built Podscape to simplify Kubernetes workflows like this — logs, events, and cluster state in one interface, without switching tools.
Struggling with this in production?
We help teams fix these exact issues. Our engineers have deployed these patterns across production environments at scale.