Running a Kubernetes cluster without metrics is flying blind. You need to know when nodes are under memory pressure, when pods are crash-looping, when deployment rollouts stall, and when your API latency spikes before users notice. kube-prometheus-stack gives you all of that in a single Helm install: Prometheus scrapes every Kubernetes component, Grafana visualises it with pre-built dashboards, and Alertmanager routes firing alerts wherever you need them.

Step 1: Add the Prometheus Community Helm repository

bash

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Step 2: Create the monitoring namespace

bash

kubectl create namespace monitoring

Step 3: Install kube-prometheus-stack

This installs Prometheus Operator, Prometheus, Alertmanager, Grafana, kube-state-metrics, node-exporter, and 20+ pre-built dashboards:

bash

helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set grafana.adminPassword=your-secure-password \
  --set prometheus.prometheusSpec.retention=15d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi

The retention=15d keeps 15 days of metrics. In production, avoid passing the Grafana password via --set as it is stored in shell history and in the Helm release Secret in plaintext. Use --values grafana-values.yaml or --set grafana.admin.existingSecret=my-grafana-secret instead. For longer retention, consider Thanos or VictoriaMetrics as a remote storage backend.

Step 4: Wait for all pods to start

bash

kubectl get pods -n monitoring --watch

Expected output after 2–3 minutes:

NAME                                                     READY   STATUS
alertmanager-kube-prometheus-stack-alertmanager-0        2/2     Running
kube-prometheus-stack-grafana-7d9f8c6b5-xk4p2            2/2     Running
kube-prometheus-stack-kube-state-metrics-6b9d-r4n7l      1/1     Running
kube-prometheus-stack-operator-5c9b9b9-k2j9m             1/1     Running
kube-prometheus-stack-prometheus-node-exporter-4wqxt     1/1     Running
prometheus-kube-prometheus-stack-prometheus-0            2/2     Running

Step 5: Access Grafana

Port-forward Grafana to your local machine:

bash

kubectl port-forward svc/kube-prometheus-stack-grafana 3000:80 -n monitoring

Open http://localhost:3000 and log in with:

Username: admin
Password: the password you set in step 3

Step 6: Explore the pre-built dashboards

Navigate to Dashboards → Browse. You'll find dashboards for:

Kubernetes / Compute Resources / Cluster — CPU/memory usage across the cluster
Kubernetes / Compute Resources / Namespace (Pods) — per-pod resource usage
Kubernetes / Networking / Cluster — network traffic by namespace
Node Exporter / Full — detailed per-node OS metrics
Kubernetes / Persistent Volumes — PVC usage and capacity

The most useful one to check first is Kubernetes / Compute Resources / Cluster — it shows you immediately if any namespace is consuming a disproportionate share of cluster resources.

Step 7: Access Prometheus directly

Port-forward Prometheus to explore raw metrics and test PromQL queries:

bash

kubectl port-forward svc/kube-prometheus-stack-prometheus 9090:9090 -n monitoring

Open http://localhost:9090 and try a query:

promql

sum(rate(container_cpu_usage_seconds_total{namespace!="kube-system",container!=""}[5m])) by (namespace)

This shows CPU usage rate per namespace over the last 5 minutes — useful for identifying which teams are consuming the most compute. The container!="" filter excludes pause containers, which expose the same metric at the pod level and would otherwise cause double-counting.

Step 8: Create a custom alert rule

PrometheusRule resources let you add alert rules without editing any ConfigMaps. The release: kube-prometheus-stack label tells the Prometheus Operator to pick up this rule:

bash

1cat <<EOF | kubectl apply -f -
2apiVersion: monitoring.coreos.com/v1
3kind: PrometheusRule
4metadata:
5  name: custom-alerts
6  namespace: monitoring
7  labels:
8    release: kube-prometheus-stack  # must match your helm install release name
9spec:
10  groups:
11  - name: pod.rules
12    rules:
13    - alert: PodCrashLooping
14      expr: increase(kube_pod_container_status_restarts_total[15m]) > 1
15      for: 5m
16      labels:
17        severity: warning
18      annotations:
19        summary: "Pod {{ $labels.pod }} is crash-looping"
20        description: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} has restarted more than once in the last 15 minutes"
21    - alert: NodeHighMemoryPressure
22      expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) < 0.1
23      for: 5m
24      labels:
25        severity: critical
26      annotations:
27        summary: "Node {{ $labels.instance }} has less than 10% memory available"
28        description: "Available memory on {{ $labels.instance }} has been below 10% for 5 minutes"
29EOF

Verify Prometheus picked up the rule:

bash

# Check in the Prometheus UI at http://localhost:9090/rules
# Or via kubectl:
kubectl get prometheusrule custom-alerts -n monitoring

Step 9: Configure Alertmanager for Slack notifications

Create a Slack incoming webhook in your workspace (Your workspace → Apps → Incoming Webhooks), then update the Alertmanager configuration:

bash

1cat <<EOF | kubectl apply -f -
2apiVersion: v1
3kind: Secret
4metadata:
5  name: alertmanager-kube-prometheus-stack-alertmanager
6  namespace: monitoring
7stringData:
8  alertmanager.yaml: |
9    global:
10      resolve_timeout: 5m
11    route:
12      receiver: slack-critical
13      group_by: ['alertname', 'namespace']
14      group_wait: 30s
15      group_interval: 5m
16      repeat_interval: 4h
17      routes:
18      - match:
19          severity: critical
20        receiver: slack-critical
21      - match:
22          severity: warning
23        receiver: slack-warnings
24    receivers:
25    - name: slack-critical
26      slack_configs:
27      - api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
28        channel: '#platform-alerts'
29        title: ':red_circle: {{ .GroupLabels.alertname }}'
30        text: '{{ range .Alerts }}{{ .Annotations.description }}{{ "\n" }}{{ end }}'
31        send_resolved: true
32    - name: slack-warnings
33      slack_configs:
34      - api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
35        channel: '#platform-warnings'
36        title: ':warning: {{ .GroupLabels.alertname }}'
37        text: '{{ range .Alerts }}{{ .Annotations.description }}{{ "\n" }}{{ end }}'
38        send_resolved: true
39EOF

Set api_url per receiver rather than in global — global.slack_api_url was deprecated in Alertmanager 0.22 and produces warnings in current versions.

Alertmanager automatically reloads its configuration when the Secret is updated.

Step 10: Verify Alertmanager received the config

Port-forward Alertmanager:

bash

kubectl port-forward svc/kube-prometheus-stack-alertmanager 9093:9093 -n monitoring

Open http://localhost:9093 and check the Status page to confirm your receivers are configured correctly.

What you built

Your cluster has full metrics coverage: Prometheus scrapes every node, pod, and Kubernetes component every 30 seconds. Grafana visualises 15 days of history with pre-built dashboards you can use immediately. PrometheusRule resources let any team add alert rules without touching the central configuration. Alertmanager routes firing alerts to the right Slack channels with deduplication and grouping so you're not flooded with repeated notifications.

Install Prometheus and Grafana on Kubernetes with kube-prometheus-stack

Before you begin

Step 1: Add the Prometheus Community Helm repository

Step 2: Create the monitoring namespace

Step 3: Install kube-prometheus-stack

Step 4: Wait for all pods to start

Step 5: Access Grafana

Step 6: Explore the pre-built dashboards

Step 7: Access Prometheus directly

Step 8: Create a custom alert rule

Step 9: Configure Alertmanager for Slack notifications

Step 10: Verify Alertmanager received the config

What you built

Struggling with this in production?