Observability
14 min readMay 4, 2026

Kubernetes Logging with Fluent Bit and Grafana Loki

Kubernetes generates three kinds of logs: container stdout/stderr, Kubernetes API audit logs, and system component logs. Getting them to a centralised, queryable store requires a collection agent on every node and a storage backend designed for log aggregation. Fluent Bit and Grafana Loki are the lightweight, cloud-native answer.

CO
Coding Protocols Team
Platform Engineering
Kubernetes Logging with Fluent Bit and Grafana Loki

Elasticsearch was the default log store for Kubernetes for years — powerful, but operationally expensive. Loki (Grafana Labs, CNCF) takes a different approach: it indexes only log labels (not the full log content), dramatically reducing storage and compute costs at the price of slower full-text search. For most Kubernetes use cases — finding logs for a specific pod, filtering by namespace, correlating logs with traces — Loki's label-based model is the right trade-off.

Fluent Bit (CNCF) is the collection agent: a small DaemonSet that reads container log files from each node and forwards them to Loki (or to Elasticsearch, CloudWatch, S3, or any other output). It's the successor to Fluentd with a much lower memory footprint — 650KB binary, ~50MB memory in production.


Kubernetes Log Architecture

Container stdout/stderr is written to /var/log/pods/<namespace>_<pod>_<uid>/<container>/ on each node. The kubelet creates symlinks at /var/log/containers/<pod>_<namespace>_<container>-<containerID>.log pointing to these files. Fluent Bit reads the symlinks at /var/log/containers/*.log as its input source.

Container (stdout/stderr)
    ↓
Node filesystem: /var/log/containers/*.log
    ↓
Fluent Bit (DaemonSet, one per node)
    ↓ tail input, kubernetes filter (enrich with pod metadata)
Loki (stores with labels: namespace, pod, container, node, app)
    ↓
Grafana (LogQL queries, dashboards, alerting)

Each log line is enriched by Fluent Bit's kubernetes filter with pod metadata from the Kubernetes API: pod name, namespace, container name, pod labels, and node name. This enrichment is what makes Kubernetes log exploration ergonomic — you can filter by app=payments-api without any log-format knowledge.


Installation

Loki (Distributed Mode via Helm)

For production, use Loki in distributed mode (separate read/write/backend microservices) rather than the monolithic single binary:

bash
1helm repo add grafana https://grafana.github.io/helm-charts
2helm repo update
3
4helm install loki grafana/loki \
5  --namespace monitoring \
6  --create-namespace \
7  --values loki-values.yaml
yaml
1# loki-values.yaml — distributed mode with S3 backend
2loki:
3  auth_enabled: false    # Disable multi-tenancy for single-cluster deployments
4                         # Enable and use X-Scope-OrgID header for multi-tenant
5
6  storage:
7    type: s3
8    s3:
9      region: us-east-1
10      bucketnames: my-org-loki-logs
11      s3ForcePathStyle: false   # Use virtual-hosted-style (path style is deprecated)
12  limits_config:
13    retention_period: 744h    # 31 days
14    max_streams_per_user: 100000
15    ingestion_rate_mb: 50
16    ingestion_burst_size_mb: 100
17
18deploymentMode: Distributed
19
20ingester:
21  replicas: 3
22
23querier:
24  replicas: 2
25
26queryFrontend:
27  replicas: 2
28
29distributor:
30  replicas: 2
31
32compactor:
33  replicas: 1
34  persistence:
35    enabled: true
36    storageClass: gp3
37    size: 20Gi
38
39# ServiceAccount with IRSA for S3 access
40serviceAccount:
41  annotations:
42    eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/loki-s3-role

For development or small clusters, the loki-stack chart installs a single-binary Loki alongside Promtail (simpler but not production-grade):

bash
helm install loki-stack grafana/loki-stack \
  --namespace monitoring \
  --set promtail.enabled=false   # Use Fluent Bit instead
  --set loki.persistence.enabled=true \
  --set loki.persistence.storageClassName=gp3 \
  --set loki.persistence.size=50Gi

Fluent Bit

bash
helm repo add fluent https://fluent.github.io/helm-charts
helm repo update

helm install fluent-bit fluent/fluent-bit \
  --namespace monitoring \
  --values fluent-bit-values.yaml

Fluent Bit Configuration

yaml
1# fluent-bit-values.yaml
2config:
3  inputs: |
4    [INPUT]
5        Name              tail
6        Path              /var/log/containers/*.log
7        Exclude_Path      /var/log/containers/*_kube-system_*.log   # Exclude kube-system (tune as needed)
8        Refresh_Interval  5
9        Mem_Buf_Limit     50MB
10        Skip_Long_Lines   On
11        Tag               kube.*
12        multiline.parser  cri,docker   # Support both CRI (containerd) and Docker log formats
13
14  filters: |
15    # Enrich log records with Kubernetes metadata (pod name, namespace, labels, etc.)
16    [FILTER]
17        Name                kubernetes
18        Match               kube.*
19        Kube_URL            https://kubernetes.default.svc:443
20        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
21        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
22        Kube_Tag_Prefix     kube.var.log.containers.
23        Merge_Log           On       # Merge JSON-formatted log messages into structured fields
24        Merge_Log_Key       log_processed
25        K8S-Logging.Parser  On       # Read parser annotation from pod spec
26        K8S-Logging.Exclude On       # Allow pods to exclude themselves from logging
27
28    # Drop health check and readiness probe logs (high volume, low value)
29    [FILTER]
30        Name  grep
31        Match kube.*
32        Exclude log .*kube-probe.*
33        Exclude log .*health.*
34
35    # Add cluster name label for multi-cluster Loki tenancy
36    [FILTER]
37        Name  record_modifier
38        Match kube.*
39        Record cluster production-us-east-1
40
41  outputs: |
42    [OUTPUT]
43        Name              loki
44        Match             kube.*
45        Host              loki-gateway.monitoring.svc.cluster.local
46        Port              80
47        Labels            job=fluentbit, cluster=$kubernetes.cluster, namespace=$kubernetes.namespace_name, pod=$kubernetes.pod_name, container=$kubernetes.container_name, node=$kubernetes.host, app=$kubernetes.labels.app
48        Label_keys        $kubernetes.pod_name,$kubernetes.namespace_name
49        Remove_keys       kubernetes,stream,time   # Remove fields included in Loki labels
50        Line_Format       json
51        Retry_Limit       False   # Buffer to disk and retry indefinitely (no log loss)
52
53  service: |
54    [SERVICE]
55        Flush         1
56        Daemon        Off
57        Log_Level     warn
58        Parsers_File  /fluent-bit/etc/parsers.conf
59        HTTP_Server   On      # Exposes /api/v1/health and Prometheus metrics at :2020
60        HTTP_Listen   0.0.0.0
61        HTTP_Port     2020
62        storage.path  /var/log/flb-storage/   # Disk buffering for backpressure
63        storage.sync  normal
64        storage.checksum Off
65        storage.max_chunks_up 128
66
67# DaemonSet — one Fluent Bit pod per node
68daemonSetVolumes:
69  - name: varlog
70    hostPath:
71      path: /var/log
72  - name: flb-storage
73    hostPath:
74      path: /var/log/flb-storage
75
76daemonSetVolumeMounts:
77  - name: varlog
78    mountPath: /var/log
79  - name: flb-storage
80    mountPath: /var/log/flb-storage
81
82resources:
83  requests:
84    cpu: 50m
85    memory: 64Mi
86  limits:
87    cpu: 200m
88    memory: 256Mi
89
90tolerations:
91  - key: node-role.kubernetes.io/master
92    effect: NoSchedule
93  - operator: Exists     # Run on all nodes including tainted ones

Loki Label Design

Loki is not Elasticsearch — don't try to index every log field as a label. Labels are used to partition log streams; high-cardinality labels (pod name, trace ID, request ID) create millions of streams and degrade performance.

Good labels (low cardinality, always useful for filtering):

  • namespace — filter by team or environment
  • app or service — filter by application
  • cluster — for multi-cluster setups
  • container — distinguish sidecar logs

Bad labels (high cardinality — put these in the log line, not labels):

  • pod — hundreds of pod IDs per application
  • trace_id — every request is unique
  • request_id
  • user_id

For structured log fields you want to search (but not label), use Loki's line_format and logfmt parser — Loki 3.x's Bloom filters make unindexed field searches significantly faster.


LogQL Querying

LogQL is Loki's query language. It combines label selectors (like PromQL) with log filtering operations:

logql
1# All logs from the payments namespace
2{namespace="production", app="payments-api"}
3
4# Filter by log level (string match)
5{namespace="production", app="payments-api"} |= "ERROR"
6
7# Structured log parsing (logfmt)
8{namespace="production", app="payments-api"}
9  | logfmt
10  | level="error"
11  | duration > 1s
12
13# JSON log parsing
14{namespace="production", app="payments-api"}
15  | json
16  | status_code >= 500
17
18# Count errors per minute (metric query)
19sum(rate({namespace="production"} |= "ERROR" [1m])) by (app)
20
21# Recent error logs for a specific pod (use limit parameter in API or Grafana line limit)
22{pod="payments-api-xxxxx-yyyyy"} |= "ERROR"

Alerting on Logs

Warning: The AlertingRule CRD requires the Loki Operator (a separate Kubernetes operator). It is NOT part of the standard grafana/loki Helm chart installed earlier in this guide. For Helm-based Loki deployments, configure ruler alerting via the ruler: section in your loki-values.yaml instead.

Loki supports alerting rules similar to Prometheus:

yaml
1# Loki rule group — alert on application errors
2# Loki Operator uses loki.grafana.com/v1 AlertingRule (not PrometheusRule)
3apiVersion: loki.grafana.com/v1
4kind: AlertingRule
5metadata:
6  name: application-log-alerts
7  namespace: monitoring
8spec:
9  groups:
10    - name: application-errors
11      interval: 1m
12      rules:
13        - alert: HighErrorRate
14          expr: |
15            sum(rate({namespace="production"} |= "ERROR" [5m])) by (app) > 1
16          for: 2m
17          labels:
18            severity: warning
19          annotations:
20            summary: "High error log rate for {{ $labels.app }}"
21
22        - alert: OOMKillDetected
23          expr: |
24            sum(count_over_time({namespace="production"} |= "OOMKilled" [5m])) by (pod) > 0
25          for: 0m
26          labels:
27            severity: critical

Frequently Asked Questions

Fluent Bit vs Promtail — which should I use?

Promtail is Loki-native (same team, tight integration, automatic pod discovery), but it's Loki-only — it can't forward to Elasticsearch or other backends. Fluent Bit supports multiple outputs and has lower memory usage. For Loki-only deployments, either works well. For multi-destination logging (Loki for recent logs, S3 for long-term archive, CloudWatch for AWS-native tooling), Fluent Bit's multi-output support is preferable.

How do I parse multi-line logs (stack traces, JSON objects)?

yaml
# Fluent Bit multiline parsing for Java stack traces
[INPUT]
    Name              tail
    Path              /var/log/containers/*_production_*.log
    Tag               kube.*
    multiline.parser  java,cri

For custom multiline patterns, define a custom parser:

yaml
[MULTILINE_PARSER]
    name          custom-go-panic
    type          regex
    flush_timeout 1000
    rule "start_state" "/(goroutine \d+)/gm" "go_state"
    rule "go_state"    "/^(\s+)/gm" "go_state"

How do I control log volume costs?

  1. Exclude noisy logs at collection time — Fluent Bit grep filter to drop health checks, probe logs, and debug-level logs in production
  2. Aggregate rather than stream — for high-volume structured logs, use Fluent Bit's rewrite_tag and throttle filters
  3. Reduce retention — Loki supports per-stream retention (shorter for debug, longer for error)
  4. Use S3 instead of block storage — Loki's S3 backend is significantly cheaper than SSD-backed block storage for log data

For OpenTelemetry-based trace correlation with logs, see OpenTelemetry Instrumentation Guide. For the Prometheus-based metrics side of the observability stack, see SLOs, Error Budgets, and Burn Rate Alerts.

Setting up a production logging stack for a Kubernetes cluster? Talk to us at Coding Protocols — we help platform teams design log collection pipelines that balance observability with cost and operational simplicity.

Related Topics

Kubernetes
Logging
Fluent Bit
Loki
Grafana
Observability
Platform Engineering
CNCF

Read Next