Install Prometheus and Grafana on Kubernetes with kube-prometheus-stack
Deploy the full kube-prometheus-stack — Prometheus, Alertmanager, and Grafana — with a single Helm install. Access pre-built cluster dashboards, write a custom alert rule, and configure Alertmanager to route alerts to Slack.
Before you begin
- A running Kubernetes cluster
- kubectl configured with cluster-admin access
- Helm 3 installed
- A Slack workspace (optional — for alert routing)
Running a Kubernetes cluster without metrics is flying blind. You need to know when nodes are under memory pressure, when pods are crash-looping, when deployment rollouts stall, and when your API latency spikes before users notice. kube-prometheus-stack gives you all of that in a single Helm install: Prometheus scrapes every Kubernetes component, Grafana visualises it with pre-built dashboards, and Alertmanager routes firing alerts wherever you need them.
Step 1: Add the Prometheus Community Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo updateStep 2: Create the monitoring namespace
kubectl create namespace monitoringStep 3: Install kube-prometheus-stack
This installs Prometheus Operator, Prometheus, Alertmanager, Grafana, kube-state-metrics, node-exporter, and 20+ pre-built dashboards:
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--set grafana.adminPassword=your-secure-password \
--set prometheus.prometheusSpec.retention=15d \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50GiThe retention=15d keeps 15 days of metrics. In production, avoid passing the Grafana password via --set as it is stored in shell history and in the Helm release Secret in plaintext. Use --values grafana-values.yaml or --set grafana.admin.existingSecret=my-grafana-secret instead. For longer retention, consider Thanos or VictoriaMetrics as a remote storage backend.
Step 4: Wait for all pods to start
kubectl get pods -n monitoring --watchExpected output after 2–3 minutes:
NAME READY STATUS
alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running
kube-prometheus-stack-grafana-7d9f8c6b5-xk4p2 2/2 Running
kube-prometheus-stack-kube-state-metrics-6b9d-r4n7l 1/1 Running
kube-prometheus-stack-operator-5c9b9b9-k2j9m 1/1 Running
kube-prometheus-stack-prometheus-node-exporter-4wqxt 1/1 Running
prometheus-kube-prometheus-stack-prometheus-0 2/2 Running
Step 5: Access Grafana
Port-forward Grafana to your local machine:
kubectl port-forward svc/kube-prometheus-stack-grafana 3000:80 -n monitoringOpen http://localhost:3000 and log in with:
- Username:
admin - Password: the password you set in step 3
Step 6: Explore the pre-built dashboards
Navigate to Dashboards → Browse. You'll find dashboards for:
- Kubernetes / Compute Resources / Cluster — CPU/memory usage across the cluster
- Kubernetes / Compute Resources / Namespace (Pods) — per-pod resource usage
- Kubernetes / Networking / Cluster — network traffic by namespace
- Node Exporter / Full — detailed per-node OS metrics
- Kubernetes / Persistent Volumes — PVC usage and capacity
The most useful one to check first is Kubernetes / Compute Resources / Cluster — it shows you immediately if any namespace is consuming a disproportionate share of cluster resources.
Step 7: Access Prometheus directly
Port-forward Prometheus to explore raw metrics and test PromQL queries:
kubectl port-forward svc/kube-prometheus-stack-prometheus 9090:9090 -n monitoringOpen http://localhost:9090 and try a query:
sum(rate(container_cpu_usage_seconds_total{namespace!="kube-system",container!=""}[5m])) by (namespace)This shows CPU usage rate per namespace over the last 5 minutes — useful for identifying which teams are consuming the most compute. The container!="" filter excludes pause containers, which expose the same metric at the pod level and would otherwise cause double-counting.
Step 8: Create a custom alert rule
PrometheusRule resources let you add alert rules without editing any ConfigMaps. The release: kube-prometheus-stack label tells the Prometheus Operator to pick up this rule:
1cat <<EOF | kubectl apply -f -
2apiVersion: monitoring.coreos.com/v1
3kind: PrometheusRule
4metadata:
5 name: custom-alerts
6 namespace: monitoring
7 labels:
8 release: kube-prometheus-stack # must match your helm install release name
9spec:
10 groups:
11 - name: pod.rules
12 rules:
13 - alert: PodCrashLooping
14 expr: increase(kube_pod_container_status_restarts_total[15m]) > 1
15 for: 5m
16 labels:
17 severity: warning
18 annotations:
19 summary: "Pod {{ $labels.pod }} is crash-looping"
20 description: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} has restarted more than once in the last 15 minutes"
21 - alert: NodeHighMemoryPressure
22 expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) < 0.1
23 for: 5m
24 labels:
25 severity: critical
26 annotations:
27 summary: "Node {{ $labels.instance }} has less than 10% memory available"
28 description: "Available memory on {{ $labels.instance }} has been below 10% for 5 minutes"
29EOFVerify Prometheus picked up the rule:
# Check in the Prometheus UI at http://localhost:9090/rules
# Or via kubectl:
kubectl get prometheusrule custom-alerts -n monitoringStep 9: Configure Alertmanager for Slack notifications
Create a Slack incoming webhook in your workspace (Your workspace → Apps → Incoming Webhooks), then update the Alertmanager configuration:
1cat <<EOF | kubectl apply -f -
2apiVersion: v1
3kind: Secret
4metadata:
5 name: alertmanager-kube-prometheus-stack-alertmanager
6 namespace: monitoring
7stringData:
8 alertmanager.yaml: |
9 global:
10 resolve_timeout: 5m
11 route:
12 receiver: slack-critical
13 group_by: ['alertname', 'namespace']
14 group_wait: 30s
15 group_interval: 5m
16 repeat_interval: 4h
17 routes:
18 - match:
19 severity: critical
20 receiver: slack-critical
21 - match:
22 severity: warning
23 receiver: slack-warnings
24 receivers:
25 - name: slack-critical
26 slack_configs:
27 - api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
28 channel: '#platform-alerts'
29 title: ':red_circle: {{ .GroupLabels.alertname }}'
30 text: '{{ range .Alerts }}{{ .Annotations.description }}{{ "\n" }}{{ end }}'
31 send_resolved: true
32 - name: slack-warnings
33 slack_configs:
34 - api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
35 channel: '#platform-warnings'
36 title: ':warning: {{ .GroupLabels.alertname }}'
37 text: '{{ range .Alerts }}{{ .Annotations.description }}{{ "\n" }}{{ end }}'
38 send_resolved: true
39EOFSet api_url per receiver rather than in global — global.slack_api_url was deprecated in Alertmanager 0.22 and produces warnings in current versions.
Alertmanager automatically reloads its configuration when the Secret is updated.
Step 10: Verify Alertmanager received the config
Port-forward Alertmanager:
kubectl port-forward svc/kube-prometheus-stack-alertmanager 9093:9093 -n monitoringOpen http://localhost:9093 and check the Status page to confirm your receivers are configured correctly.
What you built
Your cluster has full metrics coverage: Prometheus scrapes every node, pod, and Kubernetes component every 30 seconds. Grafana visualises 15 days of history with pre-built dashboards you can use immediately. PrometheusRule resources let any team add alert rules without touching the central configuration. Alertmanager routes firing alerts to the right Slack channels with deduplication and grouping so you're not flooded with repeated notifications.
We built Podscape to simplify Kubernetes workflows like this — logs, events, and cluster state in one interface, without switching tools.
Struggling with this in production?
We help teams fix these exact issues. Our engineers have deployed these patterns across production environments at scale.