When a request takes 2 seconds and spans five microservices, logs tell you something went wrong. Traces tell you exactly which service, which database call, and which line of code caused the delay.

OpenTelemetry is the vendor-neutral standard for generating traces, metrics, and logs. This tutorial instruments a Node.js service, deploys Jaeger, and shows you how to read a trace.

How Distributed Tracing Works

A trace is a collection of spans. Each span represents one unit of work — an HTTP request, a database query, a function call. Spans form a tree: a root span (the incoming request) has child spans (downstream calls).

Every span carries a trace-id that's the same across all services. When service A calls service B, it passes the trace-id in a header (traceparent). Service B creates a child span under the same trace. After the request, all spans are collected and reassembled into the tree.

HTTP request → Service A [root span, 450ms]
  └─ DB query [child, 20ms]
  └─ gRPC call → Service B [child, 380ms]
      └─ HTTP call → Service C [child, 350ms]  ← this is slow
          └─ DB query [child, 340ms]            ← this is the root cause

Part 1: Deploy Jaeger

bash

helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
helm repo update

# All-in-one Jaeger (development — in-memory storage)
helm install jaeger jaegertracing/jaeger \
  --namespace observability \
  --create-namespace \
  --set allInOne.enabled=true \
  --set collector.enabled=false \
  --set query.enabled=false \
  --set agent.enabled=false \
  --set storage.type=memory

kubectl wait --for=condition=Ready pod -l app.kubernetes.io/name=jaeger \
  -n observability --timeout=60s

For production, use Elasticsearch or Cassandra as the storage backend. The in-memory all-in-one is fine for development and this tutorial.

Access the Jaeger UI:

bash

kubectl port-forward svc/jaeger-query 16686:16686 -n observability

Open http://localhost:16686.

The OTLP endpoint (where your services send traces) is:

http://jaeger-collector.observability.svc.cluster.local:4318  (HTTP)
grpc://jaeger-collector.observability.svc.cluster.local:4317  (gRPC)

Part 2: Instrument a Node.js Service

Step 1: Install Dependencies

bash

npm install \
  @opentelemetry/sdk-node \
  @opentelemetry/auto-instrumentations-node \
  @opentelemetry/exporter-trace-otlp-http \
  @opentelemetry/resources \
  @opentelemetry/semantic-conventions

Step 2: Create the Instrumentation File

Create src/instrumentation.ts — this must load before your application code:

typescript

// src/instrumentation.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { Resource } from '@opentelemetry/resources';
import { SEMRESATTRS_SERVICE_NAME, SEMRESATTRS_SERVICE_VERSION } from '@opentelemetry/semantic-conventions';

const exporter = new OTLPTraceExporter({
  url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://localhost:4318/v1/traces',
});

const sdk = new NodeSDK({
  resource: new Resource({
    [SEMRESATTRS_SERVICE_NAME]: process.env.OTEL_SERVICE_NAME ?? 'my-service',
    [SEMRESATTRS_SERVICE_VERSION]: process.env.APP_VERSION ?? '1.0.0',
  }),
  traceExporter: exporter,
  instrumentations: [
    getNodeAutoInstrumentations({
      // Auto-instruments: http, express, pg, redis, grpc, and more
      '@opentelemetry/instrumentation-fs': { enabled: false }, // noisy, skip
    }),
  ],
});

sdk.start();

// Graceful shutdown
process.on('SIGTERM', () => {
  sdk.shutdown().then(() => process.exit(0));
});

Step 3: Load Instrumentation Before Your App

Update your start command in package.json:

json

{
  "scripts": {
    "start": "node --require ./dist/instrumentation.js dist/index.js"
  }
}

Or with ts-node:

json

{
  "scripts": {
    "start": "ts-node --require ./src/instrumentation.ts src/index.ts"
  }
}

Auto-instrumentations hook into Node.js's module system at startup. They patch http, express, pg, ioredis, and other popular libraries automatically — you get traces for database queries and HTTP calls without writing a single span manually.

Step 4: Add Custom Spans for Business Logic

Auto-instrumentation covers infrastructure calls. For your business logic, add manual spans:

typescript

import { trace, SpanStatusCode } from '@opentelemetry/api';

const tracer = trace.getTracer('my-service');

async function processOrder(orderId: string): Promise<Order> {
  return tracer.startActiveSpan('processOrder', async (span) => {
    span.setAttribute('order.id', orderId);
    span.setAttribute('order.source', 'api');

    try {
      const order = await db.orders.findById(orderId);
      span.setAttribute('order.status', order.status);

      if (order.status === 'pending') {
        await tracer.startActiveSpan('validatePayment', async (paymentSpan) => {
          paymentSpan.setAttribute('payment.method', order.paymentMethod);
          await paymentService.validate(order.payment);
          paymentSpan.end();
        });
      }

      span.setStatus({ code: SpanStatusCode.OK });
      return order;
    } catch (error) {
      span.setStatus({ code: SpanStatusCode.ERROR, message: (error as Error).message });
      span.recordException(error as Error);
      throw error;
    } finally {
      span.end();
    }
  });
}

Step 5: Deploy with Environment Variables

In your Kubernetes Deployment:

yaml

env:
  - name: OTEL_SERVICE_NAME
    value: "my-service"
  - name: OTEL_EXPORTER_OTLP_ENDPOINT
    value: "http://jaeger-collector.observability.svc.cluster.local:4318/v1/traces"
  - name: OTEL_EXPORTER_OTLP_PROTOCOL
    value: "http/protobuf"
  - name: NODE_ENV
    value: "production"

Part 3: Propagate Traces Across Services

Trace context propagates automatically when you make HTTP calls using Node.js's built-in http module or fetch — the auto-instrumentation injects the traceparent header.

If you're using a custom HTTP client, inject the header manually:

typescript

import { context, propagation } from '@opentelemetry/api';

async function callServiceB(data: unknown) {
  const headers: Record<string, string> = {
    'Content-Type': 'application/json',
  };

  // Inject current trace context into the request headers
  propagation.inject(context.active(), headers);

  const response = await fetch('http://service-b/endpoint', {
    method: 'POST',
    headers,
    body: JSON.stringify(data),
  });

  return response.json();
}

Service B must also be instrumented and extract the context on the incoming request — auto-instrumentation handles this automatically on the receiving end.

Part 4: Read a Trace in Jaeger

Open the Jaeger UI at http://localhost:16686.

Select your service from the Service dropdown
Click Find Traces
Click any trace to open the waterfall view

The waterfall view shows:

Total duration at the top (e.g., 450ms)
Spans stacked by start time, width proportional to duration
Each span shows service name, operation name, duration
Click a span to see its attributes (HTTP status, DB query, error details)

Look for:

The widest spans (most time spent)
Gaps between spans (time in transit or waiting)
Red spans (errors)
Spans with db.statement attribute (the actual SQL query that ran)

Part 5: Add Tempo as a Backend (Grafana Stack)

If you're already running the kube-prometheus-stack, replace Jaeger with Grafana Tempo and view traces inside Grafana:

bash

helm repo add grafana https://grafana.github.io/helm-charts

helm install tempo grafana/tempo \
  --namespace observability \
  --set tempo.storage.trace.backend=local

# Update OTEL endpoint to point to Tempo
kubectl set env deployment/my-app \
  OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo.observability.svc.cluster.local:4318/v1/traces

In Grafana, add a Tempo data source (URL: http://tempo:3100). Then in any Grafana panel, switch to the Explore tab, select the Tempo data source, and paste a trace-id to jump directly to the trace.

Sampling Strategy

In production, don't send 100% of traces — it's expensive. Configure head-based sampling in the SDK:

typescript

import { TraceIdRatioBasedSampler } from '@opentelemetry/sdk-trace-base';

const sdk = new NodeSDK({
  sampler: new TraceIdRatioBasedSampler(0.1), // Sample 10% of traces
  // ... rest of config
});

Or use tail-based sampling in the OpenTelemetry Collector (keeps traces with errors at 100%):

yaml

# otel-collector-config.yaml
processors:
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: errors
        type: status_code
        status_code: { status_codes: [ERROR] }
      - name: slow-requests
        type: latency
        latency: { threshold_ms: 1000 }
      - name: small-percentage
        type: probabilistic
        probabilistic: { sampling_percentage: 5 }

Sample 100% of errors and slow requests, 5% of everything else. This keeps the signal-to-noise ratio high.

Distributed Request Tracing with OpenTelemetry

Before you begin

How Distributed Tracing Works

Part 1: Deploy Jaeger

Part 2: Instrument a Node.js Service

Step 1: Install Dependencies

Step 2: Create the Instrumentation File

Step 3: Load Instrumentation Before Your App

Step 4: Add Custom Spans for Business Logic

Step 5: Deploy with Environment Variables

Part 3: Propagate Traces Across Services

Part 4: Read a Trace in Jaeger

Part 5: Add Tempo as a Backend (Grafana Stack)

Sampling Strategy

Struggling with this in production?