You don't need to configure Prometheus from scratch. The kube-prometheus-stack Helm chart deploys Prometheus, Grafana, Alertmanager, and a set of pre-built dashboards and alert rules that cover the entire Kubernetes stack — nodes, pods, deployments, PVCs, and more.

This tutorial gets you from zero to a working monitoring stack, then shows you how to add your own application metrics.

Step 1: Install the kube-prometheus-stack

bash

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.adminPassword=changeme \
  --set prometheus.prometheusSpec.retention=15d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=20Gi

This deploys:

Prometheus (metrics collection and storage)
Grafana (dashboards and visualisation)
Alertmanager (alert routing and deduplication)
kube-state-metrics (exposes Kubernetes object state as metrics)
node-exporter (exposes host-level metrics: CPU, memory, disk, network)

Wait for everything to start:

bash

kubectl wait --for=condition=Ready pods --all -n monitoring --timeout=180s
kubectl get pods -n monitoring

Step 2: Access Grafana

Forward Grafana's port locally:

bash

kubectl port-forward svc/kube-prometheus-stack-grafana 3000:80 -n monitoring

Open http://localhost:3000. Log in with admin / changeme.

Navigate to Dashboards → Browse. You'll find 30+ pre-built dashboards:

Kubernetes / Cluster — overall cluster health
Kubernetes / Nodes — per-node CPU, memory, disk
Kubernetes / Pods — per-pod resource usage
Kubernetes / Workloads — deployment/daemonset/statefulset status

Step 3: Access Prometheus

bash

kubectl port-forward svc/kube-prometheus-stack-prometheus 9090:9090 -n monitoring

Open http://localhost:9090. This is Prometheus's built-in query UI.

Try a few queries:

promql

# CPU usage per pod (5-minute average)
rate(container_cpu_usage_seconds_total{namespace="production"}[5m])

# Memory usage per pod
container_memory_working_set_bytes{namespace="production"}

# Number of ready replicas per deployment
kube_deployment_status_replicas_ready{namespace="production"}

# Pod restart count
increase(kube_pod_container_status_restarts_total[1h])

Step 4: Instrument Your Application

To expose custom metrics from your application, use a Prometheus client library.

Node.js (prom-client):

javascript

const client = require('prom-client');
const register = new client.Registry();

// Counter: total HTTP requests
const httpRequestsTotal = new client.Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'status_code'],
  registers: [register]
});

// Histogram: request duration
const httpRequestDuration = new client.Histogram({
  name: 'http_request_duration_seconds',
  help: 'HTTP request duration in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.01, 0.05, 0.1, 0.3, 0.5, 1, 2, 5],
  registers: [register]
});

// Expose metrics endpoint
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

// Instrument a route
app.use((req, res, next) => {
  const end = httpRequestDuration.startTimer({
    method: req.method,
    route: req.path
  });
  res.on('finish', () => {
    httpRequestsTotal.inc({ method: req.method, status_code: res.statusCode });
    end({ status_code: res.statusCode });
  });
  next();
});

Go (prometheus/client_golang):

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

var (
    httpRequestsTotal = promauto.NewCounterVec(prometheus.CounterOpts{
        Name: "http_requests_total",
        Help: "Total number of HTTP requests",
    }, []string{"method", "status_code"})

    httpRequestDuration = promauto.NewHistogramVec(prometheus.HistogramOpts{
        Name:    "http_request_duration_seconds",
        Help:    "HTTP request duration",
        Buckets: []float64{0.01, 0.05, 0.1, 0.3, 0.5, 1, 2, 5},
    }, []string{"method", "route"})
)

// In your main():
http.Handle("/metrics", promhttp.Handler())

Step 5: Tell Prometheus to Scrape Your App

Create a ServiceMonitor — a CRD that kube-prometheus-stack uses to configure scraping:

yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app
  namespace: production
  labels:
    release: kube-prometheus-stack   # Must match the Helm release label
spec:
  selector:
    matchLabels:
      app: my-app
  endpoints:
    - port: http
      path: /metrics
      interval: 15s

bash

kubectl apply -f servicemonitor.yaml

Your application's Service must have a port named http (or whatever you specify in endpoints.port). Verify Prometheus is scraping it at http://localhost:9090 → Status → Targets.

Step 6: Build a Grafana Dashboard for Your App

In Grafana, click the + icon → Dashboard → Add visualization.

Panel 1: Request rate

promql

sum(rate(http_requests_total{namespace="production"}[2m])) by (status_code)

Set visualization type: Time series. Set legend to {{status_code}}.

Panel 2: P95 latency

promql

histogram_quantile(0.95,
  sum(rate(http_request_duration_seconds_bucket{namespace="production"}[5m])) by (le, route)
)

Panel 3: Error rate (5xx)

promql

sum(rate(http_requests_total{namespace="production",status_code=~"5.."}[2m]))
/
sum(rate(http_requests_total{namespace="production"}[2m]))

Set threshold to 0.01 (1% error rate = red).

Save the dashboard. Click Share → Export → save the JSON to your repo to version it.

Step 7: Create an Alert Rule

Alert when error rate exceeds 1% for 5 minutes:

yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: my-app-alerts
  namespace: production
  labels:
    release: kube-prometheus-stack
spec:
  groups:
    - name: my-app
      interval: 30s
      rules:
        - alert: HighErrorRate
          expr: |
            sum(rate(http_requests_total{namespace="production",status_code=~"5.."}[5m]))
            /
            sum(rate(http_requests_total{namespace="production"}[5m]))
            > 0.01
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "High 5xx error rate on my-app"
            description: "Error rate is {{ $value | humanizePercentage }} — investigate logs"

        - alert: PodCrashLooping
          expr: |
            increase(kube_pod_container_status_restarts_total{namespace="production"}[1h]) > 3
          for: 0m
          labels:
            severity: critical
          annotations:
            summary: "Pod {{ $labels.pod }} is crash looping"

bash

kubectl apply -f prometheus-rules.yaml

Check the rule at http://localhost:9090 → Alerts. It should appear in INACTIVE state (no firing yet).

Step 8: Configure Alertmanager

By default, Alertmanager doesn't route alerts anywhere. Configure Slack notifications:

yaml

apiVersion: v1
kind: Secret
metadata:
  name: alertmanager-kube-prometheus-stack
  namespace: monitoring
stringData:
  alertmanager.yaml: |
    global:
      slack_api_url: "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"

    route:
      group_by: [alertname, namespace]
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 4h
      receiver: slack-alerts
      routes:
        - match:
            severity: critical
          receiver: slack-critical

    receivers:
      - name: slack-alerts
        slack_configs:
          - channel: "#alerts"
            title: "{{ .CommonAnnotations.summary }}"
            text: "{{ range .Alerts }}{{ .Annotations.description }}\n{{ end }}"

      - name: slack-critical
        slack_configs:
          - channel: "#oncall"
            title: "CRITICAL: {{ .CommonAnnotations.summary }}"

bash

kubectl apply -f alertmanager-config.yaml
# Restart Alertmanager to pick it up
kubectl rollout restart statefulset/alertmanager-kube-prometheus-stack-alertmanager -n monitoring

Persistent Storage in Production

The storageSpec in Step 1 creates a PersistentVolumeClaim for Prometheus. For Grafana, add:

bash

helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set grafana.persistence.enabled=true \
  --set grafana.persistence.size=5Gi \
  --reuse-values

Without persistent storage, your dashboards and alert history disappear on pod restart.

Monitoring Kubernetes with Prometheus and Grafana

Before you begin

Step 1: Install the kube-prometheus-stack

Step 2: Access Grafana

Step 3: Access Prometheus

Step 4: Instrument Your Application

Step 5: Tell Prometheus to Scrape Your App

Step 6: Build a Grafana Dashboard for Your App

Step 7: Create an Alert Rule

Step 8: Configure Alertmanager

Persistent Storage in Production

Struggling with this in production?