Kubernetes Logging with Fluent Bit and Grafana Loki
Kubernetes generates three kinds of logs: container stdout/stderr, Kubernetes API audit logs, and system component logs. Getting them to a centralised, queryable store requires a collection agent on every node and a storage backend designed for log aggregation. Fluent Bit and Grafana Loki are the lightweight, cloud-native answer.

Elasticsearch was the default log store for Kubernetes for years — powerful, but operationally expensive. Loki (Grafana Labs, CNCF) takes a different approach: it indexes only log labels (not the full log content), dramatically reducing storage and compute costs at the price of slower full-text search. For most Kubernetes use cases — finding logs for a specific pod, filtering by namespace, correlating logs with traces — Loki's label-based model is the right trade-off.
Fluent Bit (CNCF) is the collection agent: a small DaemonSet that reads container log files from each node and forwards them to Loki (or to Elasticsearch, CloudWatch, S3, or any other output). It's the successor to Fluentd with a much lower memory footprint — 650KB binary, ~50MB memory in production.
Kubernetes Log Architecture
Container stdout/stderr is written to /var/log/pods/<namespace>_<pod>_<uid>/<container>/ on each node. The kubelet creates symlinks at /var/log/containers/<pod>_<namespace>_<container>-<containerID>.log pointing to these files. Fluent Bit reads the symlinks at /var/log/containers/*.log as its input source.
Container (stdout/stderr)
↓
Node filesystem: /var/log/containers/*.log
↓
Fluent Bit (DaemonSet, one per node)
↓ tail input, kubernetes filter (enrich with pod metadata)
Loki (stores with labels: namespace, pod, container, node, app)
↓
Grafana (LogQL queries, dashboards, alerting)
Each log line is enriched by Fluent Bit's kubernetes filter with pod metadata from the Kubernetes API: pod name, namespace, container name, pod labels, and node name. This enrichment is what makes Kubernetes log exploration ergonomic — you can filter by app=payments-api without any log-format knowledge.
Installation
Loki (Distributed Mode via Helm)
For production, use Loki in distributed mode (separate read/write/backend microservices) rather than the monolithic single binary:
1helm repo add grafana https://grafana.github.io/helm-charts
2helm repo update
3
4helm install loki grafana/loki \
5 --namespace monitoring \
6 --create-namespace \
7 --values loki-values.yaml1# loki-values.yaml — distributed mode with S3 backend
2loki:
3 auth_enabled: false # Disable multi-tenancy for single-cluster deployments
4 # Enable and use X-Scope-OrgID header for multi-tenant
5
6 storage:
7 type: s3
8 s3:
9 region: us-east-1
10 bucketnames: my-org-loki-logs
11 s3ForcePathStyle: false # Use virtual-hosted-style (path style is deprecated)
12 limits_config:
13 retention_period: 744h # 31 days
14 max_streams_per_user: 100000
15 ingestion_rate_mb: 50
16 ingestion_burst_size_mb: 100
17
18deploymentMode: Distributed
19
20ingester:
21 replicas: 3
22
23querier:
24 replicas: 2
25
26queryFrontend:
27 replicas: 2
28
29distributor:
30 replicas: 2
31
32compactor:
33 replicas: 1
34 persistence:
35 enabled: true
36 storageClass: gp3
37 size: 20Gi
38
39# ServiceAccount with IRSA for S3 access
40serviceAccount:
41 annotations:
42 eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/loki-s3-roleFor development or small clusters, the loki-stack chart installs a single-binary Loki alongside Promtail (simpler but not production-grade):
helm install loki-stack grafana/loki-stack \
--namespace monitoring \
--set promtail.enabled=false # Use Fluent Bit instead
--set loki.persistence.enabled=true \
--set loki.persistence.storageClassName=gp3 \
--set loki.persistence.size=50GiFluent Bit
helm repo add fluent https://fluent.github.io/helm-charts
helm repo update
helm install fluent-bit fluent/fluent-bit \
--namespace monitoring \
--values fluent-bit-values.yamlFluent Bit Configuration
1# fluent-bit-values.yaml
2config:
3 inputs: |
4 [INPUT]
5 Name tail
6 Path /var/log/containers/*.log
7 Exclude_Path /var/log/containers/*_kube-system_*.log # Exclude kube-system (tune as needed)
8 Refresh_Interval 5
9 Mem_Buf_Limit 50MB
10 Skip_Long_Lines On
11 Tag kube.*
12 multiline.parser cri,docker # Support both CRI (containerd) and Docker log formats
13
14 filters: |
15 # Enrich log records with Kubernetes metadata (pod name, namespace, labels, etc.)
16 [FILTER]
17 Name kubernetes
18 Match kube.*
19 Kube_URL https://kubernetes.default.svc:443
20 Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
21 Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
22 Kube_Tag_Prefix kube.var.log.containers.
23 Merge_Log On # Merge JSON-formatted log messages into structured fields
24 Merge_Log_Key log_processed
25 K8S-Logging.Parser On # Read parser annotation from pod spec
26 K8S-Logging.Exclude On # Allow pods to exclude themselves from logging
27
28 # Drop health check and readiness probe logs (high volume, low value)
29 [FILTER]
30 Name grep
31 Match kube.*
32 Exclude log .*kube-probe.*
33 Exclude log .*health.*
34
35 # Add cluster name label for multi-cluster Loki tenancy
36 [FILTER]
37 Name record_modifier
38 Match kube.*
39 Record cluster production-us-east-1
40
41 outputs: |
42 [OUTPUT]
43 Name loki
44 Match kube.*
45 Host loki-gateway.monitoring.svc.cluster.local
46 Port 80
47 Labels job=fluentbit, cluster=$kubernetes.cluster, namespace=$kubernetes.namespace_name, pod=$kubernetes.pod_name, container=$kubernetes.container_name, node=$kubernetes.host, app=$kubernetes.labels.app
48 Label_keys $kubernetes.pod_name,$kubernetes.namespace_name
49 Remove_keys kubernetes,stream,time # Remove fields included in Loki labels
50 Line_Format json
51 Retry_Limit False # Buffer to disk and retry indefinitely (no log loss)
52
53 service: |
54 [SERVICE]
55 Flush 1
56 Daemon Off
57 Log_Level warn
58 Parsers_File /fluent-bit/etc/parsers.conf
59 HTTP_Server On # Exposes /api/v1/health and Prometheus metrics at :2020
60 HTTP_Listen 0.0.0.0
61 HTTP_Port 2020
62 storage.path /var/log/flb-storage/ # Disk buffering for backpressure
63 storage.sync normal
64 storage.checksum Off
65 storage.max_chunks_up 128
66
67# DaemonSet — one Fluent Bit pod per node
68daemonSetVolumes:
69 - name: varlog
70 hostPath:
71 path: /var/log
72 - name: flb-storage
73 hostPath:
74 path: /var/log/flb-storage
75
76daemonSetVolumeMounts:
77 - name: varlog
78 mountPath: /var/log
79 - name: flb-storage
80 mountPath: /var/log/flb-storage
81
82resources:
83 requests:
84 cpu: 50m
85 memory: 64Mi
86 limits:
87 cpu: 200m
88 memory: 256Mi
89
90tolerations:
91 - key: node-role.kubernetes.io/master
92 effect: NoSchedule
93 - operator: Exists # Run on all nodes including tainted onesLoki Label Design
Loki is not Elasticsearch — don't try to index every log field as a label. Labels are used to partition log streams; high-cardinality labels (pod name, trace ID, request ID) create millions of streams and degrade performance.
Good labels (low cardinality, always useful for filtering):
namespace— filter by team or environmentapporservice— filter by applicationcluster— for multi-cluster setupscontainer— distinguish sidecar logs
Bad labels (high cardinality — put these in the log line, not labels):
pod— hundreds of pod IDs per applicationtrace_id— every request is uniquerequest_iduser_id
For structured log fields you want to search (but not label), use Loki's line_format and logfmt parser — Loki 3.x's Bloom filters make unindexed field searches significantly faster.
LogQL Querying
LogQL is Loki's query language. It combines label selectors (like PromQL) with log filtering operations:
1# All logs from the payments namespace
2{namespace="production", app="payments-api"}
3
4# Filter by log level (string match)
5{namespace="production", app="payments-api"} |= "ERROR"
6
7# Structured log parsing (logfmt)
8{namespace="production", app="payments-api"}
9 | logfmt
10 | level="error"
11 | duration > 1s
12
13# JSON log parsing
14{namespace="production", app="payments-api"}
15 | json
16 | status_code >= 500
17
18# Count errors per minute (metric query)
19sum(rate({namespace="production"} |= "ERROR" [1m])) by (app)
20
21# Recent error logs for a specific pod (use limit parameter in API or Grafana line limit)
22{pod="payments-api-xxxxx-yyyyy"} |= "ERROR"Alerting on Logs
Warning: The
AlertingRuleCRD requires the Loki Operator (a separate Kubernetes operator). It is NOT part of the standardgrafana/lokiHelm chart installed earlier in this guide. For Helm-based Loki deployments, configure ruler alerting via theruler:section in your loki-values.yaml instead.
Loki supports alerting rules similar to Prometheus:
1# Loki rule group — alert on application errors
2# Loki Operator uses loki.grafana.com/v1 AlertingRule (not PrometheusRule)
3apiVersion: loki.grafana.com/v1
4kind: AlertingRule
5metadata:
6 name: application-log-alerts
7 namespace: monitoring
8spec:
9 groups:
10 - name: application-errors
11 interval: 1m
12 rules:
13 - alert: HighErrorRate
14 expr: |
15 sum(rate({namespace="production"} |= "ERROR" [5m])) by (app) > 1
16 for: 2m
17 labels:
18 severity: warning
19 annotations:
20 summary: "High error log rate for {{ $labels.app }}"
21
22 - alert: OOMKillDetected
23 expr: |
24 sum(count_over_time({namespace="production"} |= "OOMKilled" [5m])) by (pod) > 0
25 for: 0m
26 labels:
27 severity: criticalFrequently Asked Questions
Fluent Bit vs Promtail — which should I use?
Promtail is Loki-native (same team, tight integration, automatic pod discovery), but it's Loki-only — it can't forward to Elasticsearch or other backends. Fluent Bit supports multiple outputs and has lower memory usage. For Loki-only deployments, either works well. For multi-destination logging (Loki for recent logs, S3 for long-term archive, CloudWatch for AWS-native tooling), Fluent Bit's multi-output support is preferable.
How do I parse multi-line logs (stack traces, JSON objects)?
# Fluent Bit multiline parsing for Java stack traces
[INPUT]
Name tail
Path /var/log/containers/*_production_*.log
Tag kube.*
multiline.parser java,criFor custom multiline patterns, define a custom parser:
[MULTILINE_PARSER]
name custom-go-panic
type regex
flush_timeout 1000
rule "start_state" "/(goroutine \d+)/gm" "go_state"
rule "go_state" "/^(\s+)/gm" "go_state"How do I control log volume costs?
- Exclude noisy logs at collection time — Fluent Bit
grepfilter to drop health checks, probe logs, and debug-level logs in production - Aggregate rather than stream — for high-volume structured logs, use Fluent Bit's
rewrite_tagandthrottlefilters - Reduce retention — Loki supports per-stream retention (shorter for debug, longer for error)
- Use S3 instead of block storage — Loki's S3 backend is significantly cheaper than SSD-backed block storage for log data
For OpenTelemetry-based trace correlation with logs, see OpenTelemetry Instrumentation Guide. For the Prometheus-based metrics side of the observability stack, see SLOs, Error Budgets, and Burn Rate Alerts.
Setting up a production logging stack for a Kubernetes cluster? Talk to us at Coding Protocols — we help platform teams design log collection pipelines that balance observability with cost and operational simplicity.


