Prometheus Operator: ServiceMonitor, AlertManager, and Production Monitoring
The Prometheus Operator introduces a declarative way to manage Prometheus, Alertmanager, and their scrape configurations as Kubernetes custom resources. Instead of editing Prometheus config files, you create ServiceMonitor and PodMonitor objects — and the operator updates the Prometheus configuration automatically.

Managing Prometheus in Kubernetes without the operator means editing a ConfigMap and reloading Prometheus every time you add a scrape target. At scale with dozens of teams adding applications, this is a shared config file that everyone touches — merge conflicts, broken YAML, and manual operator intervention every time a new service needs monitoring.
The Prometheus Operator (part of the prometheus-operator GitHub organisation, bundled in kube-prometheus-stack) solves this with custom resources: teams add a ServiceMonitor to their namespace, and the operator updates Prometheus's scrape config automatically. No central config file, no shared coordination required.
Architecture
The Prometheus Operator watches for six main custom resources:
| Resource | Purpose |
|---|---|
Prometheus | Deploys and configures a Prometheus instance |
ServiceMonitor | Declares how to scrape metrics from a Kubernetes Service |
PodMonitor | Declares how to scrape metrics directly from Pods |
PrometheusRule | Defines alerting and recording rules |
Alertmanager | Deploys and configures Alertmanager |
AlertmanagerConfig | Namespace-scoped routing and receiver configuration |
The operator renders all discovered ServiceMonitor and PodMonitor objects into a Prometheus scrape configuration, reloads Prometheus via the /-/reload HTTP endpoint, and manages the lifecycle of Prometheus and Alertmanager StatefulSets.
Installation via kube-prometheus-stack
kube-prometheus-stack is the standard Helm chart that installs the complete monitoring stack: Prometheus Operator, Prometheus, Alertmanager, Grafana, node-exporter, kube-state-metrics, and a set of pre-built dashboards and alerts for Kubernetes infrastructure.
1helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
2helm repo update
3
4helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
5 --namespace monitoring \
6 --create-namespace \
7 --values prometheus-values.yaml1# prometheus-values.yaml
2prometheus:
3 prometheusSpec:
4 retention: 30d
5 retentionSize: "40GB"
6 storageSpec:
7 volumeClaimTemplate:
8 spec:
9 storageClassName: gp3
10 accessModes: [ReadWriteOnce]
11 resources:
12 requests:
13 storage: 50Gi
14
15 # Watch ServiceMonitors in ALL namespaces (not just the monitoring namespace)
16 serviceMonitorNamespaceSelector: {} # {} = all namespaces
17 serviceMonitorSelector: {} # {} = all ServiceMonitors
18 podMonitorNamespaceSelector: {}
19 podMonitorSelector: {}
20 ruleNamespaceSelector: {}
21 ruleSelector: {}
22
23 resources:
24 requests:
25 cpu: 200m
26 memory: 2Gi
27 limits:
28 cpu: "2"
29 memory: 8Gi
30
31alertmanager:
32 alertmanagerSpec:
33 storage:
34 volumeClaimTemplate:
35 spec:
36 storageClassName: gp3
37 resources:
38 requests:
39 storage: 10Gi
40
41grafana:
42 enabled: true
43 adminPassword: "${GRAFANA_ADMIN_PASSWORD}" # Use External Secrets for production
44 persistence:
45 enabled: true
46 storageClassName: gp3
47 size: 10Gi
48 sidecar:
49 dashboards:
50 enabled: true # Auto-import dashboards from ConfigMaps with label
51 datasources:
52 enabled: true
53
54# kube-state-metrics: exports Kubernetes object metrics
55kube-state-metrics:
56 metricLabelsAllowlist:
57 - pods=[app,version,team] # Which pod labels to include as metric labels
58
59# node-exporter: exports node-level metrics (CPU, memory, disk, network)
60nodeExporter:
61 enabled: trueServiceMonitor
ServiceMonitor tells Prometheus how to scrape a Kubernetes Service endpoint:
1apiVersion: monitoring.coreos.com/v1
2kind: ServiceMonitor
3metadata:
4 name: payments-api
5 namespace: production
6 labels:
7 team: payments # Label used by Prometheus selector (if configured)
8spec:
9 selector:
10 matchLabels:
11 app: payments-api # Select Services with this label in the matched namespaces
12 namespaceSelector:
13 matchNames:
14 - production # Only match Services in the production namespace
15 endpoints:
16 - port: metrics # Named port in the Service spec (or use targetPort)
17 path: /metrics
18 interval: 30s # Scrape every 30 seconds
19 scrapeTimeout: 10s
20 scheme: http
21 # TLS config for HTTPS metrics endpoints:
22 # tlsConfig:
23 # caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
24 # insecureSkipVerify: false
25 relabelings:
26 # Add namespace label to every metric
27 - sourceLabels: [__meta_kubernetes_namespace]
28 targetLabel: namespace
29 # Add pod name label
30 - sourceLabels: [__meta_kubernetes_pod_name]
31 targetLabel: pod
32 metricRelabelings:
33 # Drop high-cardinality metrics that consume too much storage
34 - sourceLabels: [__name__]
35 regex: "go_gc_duration_seconds.*"
36 action: dropThe corresponding Service must have the named metrics port:
1apiVersion: v1
2kind: Service
3metadata:
4 name: payments-api
5 namespace: production
6 labels:
7 app: payments-api # Matches ServiceMonitor selector
8spec:
9 selector:
10 app: payments-api
11 ports:
12 - name: http
13 port: 80
14 targetPort: 8080
15 - name: metrics # Named port referenced in ServiceMonitor
16 port: 9090
17 targetPort: 9090PodMonitor
PodMonitor scrapes pods directly, bypassing the Service layer — useful for pods that don't have a Service but expose metrics, or for scraping each pod individually (rather than load-balanced via a Service):
1apiVersion: monitoring.coreos.com/v1
2kind: PodMonitor
3metadata:
4 name: batch-workers
5 namespace: production
6spec:
7 selector:
8 matchLabels:
9 app: batch-worker
10 namespaceSelector:
11 any: true # Monitor this pod type in all namespaces
12 podMetricsEndpoints:
13 - port: metrics
14 path: /metrics
15 interval: 60s # Scrape less frequently for batch workloads
16 relabelings:
17 - sourceLabels: [__meta_kubernetes_pod_label_job_type]
18 targetLabel: job_typePrometheusRule
PrometheusRule defines alerting rules and recording rules. Unlike configmap-based rules, these are validated by the operator and reloaded automatically:
1apiVersion: monitoring.coreos.com/v1
2kind: PrometheusRule
3metadata:
4 name: payments-api-alerts
5 namespace: production
6 labels:
7 prometheus: kube-prometheus-stack
8 role: alert-rules
9spec:
10 groups:
11 - name: payments-api.rules
12 interval: 30s # Rule evaluation interval
13 rules:
14 # Recording rule: pre-compute expensive query
15 - record: job:payments_request_duration_seconds:p99
16 expr: histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{job="payments-api"}[5m])) by (le))
17
18 # Alert: high error rate
19 - alert: PaymentsApiHighErrorRate
20 expr: |
21 (
22 sum(rate(http_requests_total{job="payments-api", status=~"5.."}[5m]))
23 /
24 sum(rate(http_requests_total{job="payments-api"}[5m]))
25 ) > 0.01
26 for: 5m
27 labels:
28 severity: critical
29 team: payments
30 annotations:
31 summary: "Payments API error rate > 1%"
32 description: "Current error rate: {{ $value | humanizePercentage }}"
33 runbook: "https://runbooks.example.com/payments-api/high-error-rate"
34
35 # Alert: P99 latency SLO breach
36 - alert: PaymentsApiLatencySLOBreach
37 expr: job:payments_request_duration_seconds:p99 > 0.5
38 for: 2m
39 labels:
40 severity: warning
41 team: payments
42 annotations:
43 summary: "Payments API P99 latency exceeds 500ms"
44
45 # Alert: pod not ready
46 - alert: PaymentsApiPodNotReady
47 expr: |
48 kube_pod_status_ready{namespace="production", pod=~"payments-api-.*", condition="true"} == 0
49 for: 5m
50 labels:
51 severity: criticalAlertmanager Configuration
Global AlertmanagerConfig (Platform Level)
1# Alertmanager main config — managed as a Kubernetes Secret by the operator
2# kube-prometheus-stack populates this from the alertmanager.config value
3alertmanager:
4 config:
5 global:
6 resolve_timeout: 5m
7 slack_api_url: "https://hooks.slack.com/services/XXXX"
8
9 route:
10 group_by: [alertname, namespace, severity]
11 group_wait: 30s
12 group_interval: 5m
13 repeat_interval: 12h
14 receiver: default
15 routes:
16 - matchers:
17 - severity=critical
18 receiver: pagerduty-critical
19 continue: true # Continue matching to also send to Slack
20 - matchers:
21 - severity=warning
22 receiver: slack-warnings
23 group_wait: 2m
24
25 receivers:
26 - name: default
27 slack_configs:
28 - channel: "#platform-alerts"
29 text: "{{ range .Alerts }}{{ .Annotations.summary }}\n{{ end }}"
30
31 - name: pagerduty-critical
32 pagerduty_configs:
33 - routing_key: "xxxxx"
34 description: "{{ .CommonAnnotations.summary }}"
35 severity: "{{ .CommonLabels.severity }}"
36
37 - name: slack-warnings
38 slack_configs:
39 - channel: "#platform-warnings"
40 title: "[{{ .Status | toUpper }}] {{ .CommonLabels.alertname }}"AlertmanagerConfig (Team Level)
AlertmanagerConfig lets teams configure their own routing rules within a namespace, without requiring platform team involvement:
1apiVersion: monitoring.coreos.com/v1
2kind: AlertmanagerConfig
3metadata:
4 name: payments-team-config
5 namespace: production
6spec:
7 route:
8 receiver: payments-slack
9 matchers:
10 - "team=payments" # Only route alerts with team=payments label
11 groupBy: [alertname]
12 groupWait: 1m
13 repeatInterval: 4h
14 receivers:
15 - name: payments-slack
16 slackConfigs:
17 - apiURL:
18 name: payments-slack-webhook
19 key: url
20 channel: "#payments-alerts"
21 title: "{{ .CommonLabels.alertname }}"
22 text: "{{ range .Alerts }}{{ .Annotations.description }}{{ end }}"The AlertmanagerConfig is automatically incorporated into Alertmanager's routing tree by the operator, prepended with the namespace to prevent cross-namespace routing conflicts.
Multi-Namespace Monitoring
By default, Prometheus Operator watches only the monitoring namespace for ServiceMonitor resources. For a multi-team cluster where each team manages their own ServiceMonitor, set serviceMonitorNamespaceSelector: {} in the Prometheus spec.
To give teams permission to create ServiceMonitors in their namespaces:
1apiVersion: rbac.authorization.k8s.io/v1
2kind: ClusterRole
3metadata:
4 name: team-service-monitor
5rules:
6 - apiGroups: ["monitoring.coreos.com"]
7 resources: [servicemonitors, podmonitors, prometheusrules]
8 verbs: [get, list, watch, create, update, patch, delete]
9---
10apiVersion: rbac.authorization.k8s.io/v1
11kind: RoleBinding
12metadata:
13 name: team-service-monitor
14 namespace: production
15subjects:
16 - kind: Group
17 name: payments-engineers
18roleRef:
19 kind: ClusterRole
20 name: team-service-monitor
21 apiGroup: rbac.authorization.k8s.ioFrequently Asked Questions
How do I add a custom Grafana dashboard to kube-prometheus-stack?
Create a ConfigMap with the dashboard JSON and the label grafana_dashboard: "1":
1apiVersion: v1
2kind: ConfigMap
3metadata:
4 name: payments-dashboard
5 namespace: monitoring
6 labels:
7 grafana_dashboard: "1" # Grafana sidecar picks this up automatically
8data:
9 payments-dashboard.json: |
10 { ... Grafana dashboard JSON ... }The Grafana sidecar (grafana.sidecar.dashboards.enabled=true in the Helm values) watches for ConfigMaps with this label and imports them. Any team can add a dashboard to the cluster Grafana without touching the platform team's configuration.
What's the difference between metricRelabelings and relabelings?
relabelings run during target discovery (before any scrape happens), using metadata from the discovered target (pod labels, service labels, etc.). They can rename labels or filter out targets entirely — if a target is dropped by relabelings, it is never scraped. metricRelabelings run after scraping, on the individual metric samples — useful for dropping expensive metrics or renaming metric labels post-collection. Use relabelings for target-level filtering; metricRelabelings for per-metric filtering.
How do I handle Prometheus high availability?
Run two Prometheus replicas with replicas: 2 in the Prometheus spec. Both scrape all targets independently. For query deduplication (both replicas return the same data), use Thanos or Grafana Mimir in front — they deduplicate series from multiple Prometheus instances and provide long-term storage. For small clusters, two Prometheus replicas without Thanos is sufficient; for large clusters or multi-cluster observability, Thanos or Mimir is the production-grade approach.
For Kubernetes-native metrics that kube-prometheus-stack exposes (SLOs, error budgets), see SLOs, Error Budgets, and Burn Rate Alerts. For distributed tracing that correlates with Prometheus metrics, see OpenTelemetry Instrumentation Guide. For the logging side of the observability stack, see Kubernetes Logging with Fluent Bit and Loki. For the complete end-to-end production monitoring setup with persistent storage, remote write, and recording rules, see Prometheus and Grafana on Kubernetes: Production Monitoring Stack.
Setting up production Prometheus monitoring for a multi-team Kubernetes platform? Talk to us at Coding Protocols — we help platform teams design monitoring architectures that scale with team growth without becoming a maintenance burden.


