eBPF Observability: Tetragon, Hubble, and Pixie for Kubernetes (2026)

Metrics and logs give you an application-layer view of what happened. They can tell you that a request returned a 500, that memory usage spiked at 14:32, or that the error log captured an exception stack trace. What they cannot tell you is which process inside a container made an outbound TCP connection to 185.234.218.95 on port 4444 at 14:31 — one minute before that spike. They cannot tell you whether your payments container executed /bin/bash after the last deployment, or why a specific PostgreSQL query started taking 800ms without adding any instrumentation to your application code.

eBPF changes this. By attaching probes directly to Linux kernel functions — execve, tcp_connect, do_sys_open — eBPF-based tools see what the kernel sees: every process execution, every file open, every network syscall, every memory allocation. No code changes, no sidecars, no userspace agents proxying traffic. The kernel is the agent.

The three tools I use cover distinct layers: Tetragon handles kernel-level security event visibility and enforcement (think Falco but eBPF-native, with the option to kill a process mid-syscall). Hubble handles Cilium network flow visibility — which pod connected to which service, on which port, with what HTTP status, and which policy allowed or dropped it. Pixie handles application performance — HTTP request latencies, slow database queries, and CPU flame graphs — auto-instrumented at the kernel level without touching your application code.

Running all three is not overhead. It is coverage at different layers of the stack. Understanding the boundary between them is what prevents you from shipping three tools that all alert on the same thing.

Tetragon: Kernel-Level Security Observability

Tetragon is a CNCF incubating project, co-maintained under the Cilium umbrella. It uses eBPF kprobes to intercept kernel function calls — process execution via security_bprm_check, network connections via tcp_connect, file opens via security_file_open, and Linux capability checks. Events are emitted as structured JSON on stdout. If you configure enforcement, Tetragon can send SIGKILL to a process before a syscall completes — not after the connection has been established, but during it.

That distinction matters for security. Traditional alerting catches what already happened. Tetragon can prevent it.

Installing Tetragon

Tetragon ships as a Helm chart under the Cilium project. It does not require Cilium as your CNI — it is a standalone DaemonSet that uses eBPF directly. It works on any CNI.

bash

1helm repo add cilium https://helm.cilium.io
2helm repo update
3
4helm install tetragon cilium/tetragon \
5  --namespace kube-system \
6  --version 1.2.0 \
7  --set tetragon.enablePolicyEnforcement=true

After install, verify the DaemonSet is running and the eBPF programs are loaded:

bash

kubectl -n kube-system rollout status daemonset/tetragon
kubectl -n kube-system exec ds/tetragon -- tetra status

TracingPolicy: The Core Tetragon Resource

TracingPolicy is a cluster-scoped CRD that defines which kernel functions to hook and what to do when they fire. Every policy consists of kprobes (kernel function probes), argument extraction, selectors (filters), and actions.

Here is a policy that detects unexpected outbound connections — processes making TCP connections to ports other than 80 or 443:

yaml

1apiVersion: cilium.io/v1alpha1
2kind: TracingPolicy
3metadata:
4  name: detect-unexpected-outbound
5spec:
6  kprobes:
7    - call: "tcp_connect"
8      syscall: false
9      return: false
10      args:
11        - index: 0
12          type: "sock"
13      selectors:
14        - matchArgs:
15            - index: 0
16              operator: "NotDPort"
17              values:
18                - "443"
19                - "80"
20          matchActions:
21            - action: Sigkill

The Sigkill action sends SIGKILL to the process during the tcp_connect kernel call — before the connection is established. Change it to Post to only emit an event without enforcement. I recommend starting with Post in any new environment and switching to Sigkill once you have confirmed the policy does not fire on legitimate traffic.

A second policy I run on every production cluster: detect when curl, wget, or any other download tool is executed inside a container. This catches C2 beacon callbacks and lateral movement attempts:

yaml

1apiVersion: cilium.io/v1alpha1
2kind: TracingPolicy
3metadata:
4  name: detect-download-tools
5spec:
6  kprobes:
7    - call: "security_bprm_check"
8      syscall: false
9      return: false
10      args:
11        - index: 0
12          type: "linux_binprm"
13      selectors:
14        - matchBinaries:
15            - operator: "In"
16              values:
17                - "/usr/bin/curl"
18                - "/usr/bin/wget"
19                - "/bin/curl"
20                - "/usr/bin/python3"
21                - "/usr/bin/python"
22          matchActions:
23            - action: Post

The matchBinaries selector filters on the binary being executed. Note that this matches the full path — if your container ships curl at a non-standard path, you need to add it. You can discover binary paths at runtime from Tetragon's process_exec events before writing the enforcement policy.

Reading Tetragon Events

Install the tetra CLI on your local machine (or on a jump host):

bash

GOOS=linux GOARCH=amd64 curl -Lo tetra \
  https://github.com/cilium/tetragon/releases/latest/download/tetra-linux-amd64
chmod +x tetra
sudo mv tetra /usr/local/bin/

Stream events directly from the Tetragon agent:

bash

kubectl exec -n kube-system ds/tetragon -- tetra getevents -o compact

The compact output is readable in a terminal:

🚀 process default/payments-api-6d9f7b-xxx /bin/sh -c "ls /etc/secrets"
📂 open   default/payments-api-6d9f7b-xxx /etc/secrets/db-password
🔌 connect default/payments-api-6d9f7b-xxx tcp 10.0.1.4:42312 -> 185.234.218.95:4444

For production, you want structured JSON. Switch to --output json and pipe to your log shipper.

Exporting Tetragon Events for Persistence

Tetragon writes events to stdout. Configure which event types to export in the Helm values:

yaml

1tetragon:
2  export:
3    stdout:
4      enabledFields:
5        - process_exec
6        - process_exit
7        - process_kprobe

Tetragon does not persist events internally. You need your log shipper to capture them from the container's stdout. I cover the FluentBit pipeline in the event pipeline section below.

Hubble: Network Flow Observability for Cilium Clusters

Hubble is built into Cilium. If you are running Cilium as your CNI, Hubble requires no additional DaemonSet, no additional agent, and no additional network overhead. It is a ring buffer inside the Cilium agent that captures flow records. When you enable the Hubble Relay, those per-node ring buffers become queryable from a central endpoint.

Enabling Hubble

If Cilium is already installed, enable Hubble with a Helm upgrade:

bash

1helm upgrade cilium cilium/cilium \
2  --namespace kube-system \
3  --reuse-values \
4  --set hubble.enabled=true \
5  --set hubble.relay.enabled=true \
6  --set hubble.ui.enabled=true \
7  --set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,http}"

The hubble.metrics.enabled list controls which Prometheus metrics Hubble exposes. Start with this set — it covers the most actionable signals (drops, DNS queries, HTTP status codes). HTTP metrics only contain status code and method; full URL paths require an active CiliumNetworkPolicy with L7 visibility or an explicit CiliumClusterwideNetworkPolicy with toFQDNs rules.

Verify Hubble came up:

bash

kubectl -n kube-system rollout status deployment/hubble-relay
kubectl -n kube-system rollout status deployment/hubble-ui

Hubble CLI

Install the hubble CLI:

bash

curl -L --fail --remote-name-all \
  https://github.com/cilium/hubble/releases/latest/download/hubble-linux-amd64.tar.gz
tar xzvf hubble-linux-amd64.tar.gz
sudo mv hubble /usr/local/bin/

Port-forward to the Hubble Relay (the aggregated endpoint that spans all nodes):

bash

kubectl port-forward -n kube-system svc/hubble-relay 4245:80 &
export HUBBLE_SERVER=localhost:4245

Now query flows:

bash

1hubble observe --namespace payments --follow
2
3hubble observe --namespace payments --verdict DROPPED --follow
4
5hubble observe --namespace payments --protocol http --follow
6
7hubble observe \
8  --from-pod payments/payments-api-xxx \
9  --to-pod payments/postgres-0 \
10  --follow

The --verdict DROPPED filter is the one I use most. After a policy change, running this for 5 minutes tells me immediately whether I have misconfigured egress rules. The output includes the policy name that caused the drop, so you can go directly to the specific CiliumNetworkPolicy rather than guessing.

Practical Hubble Use Cases

Service dependency mapping: Before you write network policies, you need a ground-truth service graph. Hubble gives you one:

bash

hubble observe \
  --namespace production \
  --output json \
  --last 50000 \
  | jq -r '[.source.namespace + "/" + .source.pod_name, .destination.namespace + "/" + .destination.pod_name] | @tsv' \
  | sort | uniq

Run this for 24 hours across business hours and you have a complete picture of which pods communicate — including the ones that only communicate during nightly batch jobs, which are always missed in hand-crafted dependency diagrams.

Debugging policy drops: When a service starts returning connection timeouts after a policy rollout, the first thing I do is check Hubble before I look at the policy YAML:

bash

hubble observe \
  --namespace production \
  --verdict DROPPED \
  --output json \
  | jq '{src: .source.pod_name, dst: .destination.pod_name, dst_port: .destination.port, policy: .drop_reason_desc}'

The drop_reason_desc field tells you exactly which policy rule matched. This cuts policy debugging from 30 minutes of YAML reading to 2 minutes of flow inspection.

Hubble Metrics and Prometheus Alerting

With hubble.metrics.enabled, Cilium agent pods expose metrics at :9965/metrics. Key signals:

hubble_drop_total{reason,direction,namespace} — dropped packets by reason and namespace
hubble_flows_processed_total{node_name} — total flow volume per node
hubble_http_requests_total{method,protocol,reporter,status_code} — HTTP metrics (requires L7 policy)
hubble_dns_queries_total{qtypes,rcode} — DNS query volume and failure codes

A useful alert for production namespaces:

yaml

1groups:
2  - name: hubble.rules
3    rules:
4      - alert: CiliumPolicyDropsSpike
5        expr: |
6          sum by (namespace) (
7            rate(hubble_drop_total{namespace="production"}[5m])
8          ) > 10
9        for: 2m
10        labels:
11          severity: warning
12        annotations:
13          summary: "Cilium dropping >10 flows/s in namespace {{ $labels.namespace }}"
14          description: "Check Hubble UI or run: hubble observe --namespace {{ $labels.namespace }} --verdict DROPPED"
15
16      - alert: HubbleDNSNXDOMAINSpike
17        expr: |
18          sum(rate(hubble_dns_queries_total{rcode="Non-Existent Domain"}[5m])) > 20
19        for: 3m
20        labels:
21          severity: warning
22        annotations:
23          summary: "High rate of DNS NXDOMAIN responses — possible misconfigured DNS or service discovery failure"

Pixie: Auto-Instrumented Application Performance

Pixie (CNCF sandbox, maintained by New Relic since the 2021 acquisition) uses eBPF uprobes and kprobes to capture application-level telemetry without modifying application code. The mechanisms differ by protocol: for HTTP/1.1 it intercepts socket reads and writes; for PostgreSQL it parses the wire protocol; for TLS it uses uprobes on OpenSSL's SSL_read and SSL_write functions in the process's address space (not the kernel TLS path).

Each node runs a vizier-pem pod (the eBPF data collector). Data is stored in per-node memory — 60 minutes of retention by default, no external storage backend required. You query it via PxL, Pixie's pandas-like scripting language, either from the Pixie UI or the px CLI.

Installing Pixie

bash

bash -c "$(curl -fsSL https://withpixie.ai/install.sh)"

px deploy

The px deploy command walks through connecting to a Pixie Cloud organization (managed or self-hosted). The cluster-side component is entirely in-cluster; Pixie Cloud only handles authentication and the query interface. If you are running air-gapped, self-hosted Pixie Cloud is the path — but it is significantly more operational overhead than the managed offering.

After deploy, verify the Pixie components:

bash

kubectl -n pl get pods

You want kelvin, cloud-conn, metadata, and at least one vizier-pem per node all in Running state.

PxL Scripts: Built-In and Custom

Pixie ships a library of pre-written scripts for common debugging tasks. Run them directly from the CLI:

bash

px run px/http_data_filtered -- --namespace=production

px run px/service_stats

px run px/sql_query_stats

The px/http_data_filtered script shows per-request HTTP data for a namespace: URL, method, status code, latency, pod name. You get this with zero instrumentation in your application. The px/sql_query_stats script does the same for PostgreSQL, MySQL, and Redis.

For custom analysis, PxL is Python-like:

python

1import px
2
3df = px.DataFrame(table='pgsql_events', start_time='-10m')
4
5df = df[df.latency_ns > 100 * 1000 * 1000]
6
7df.latency_ms = df.latency_ns / 1000 / 1000
8df.pod = df.ctx['pod']
9
10df = df.groupby(['pod', 'req_body']).agg(
11    count=('req_body', px.count),
12    p50_ms=('latency_ms', px.p50),
13    p99_ms=('latency_ms', px.p99),
14)
15df = df.sort('p99_ms', ascending=False)
16
17px.display(df, 'slow_postgres_queries')

This script finds PostgreSQL queries with P99 latency above 100ms in the last 10 minutes, grouped by pod and query text. You can run this during an incident in 30 seconds without adding a single line of instrumentation to your application.

Pixie Limitations You Need to Know

Pixie is a debugging and investigation tool. It is not a replacement for OpenTelemetry-based distributed tracing.

First, the retention ceiling. Sixty minutes of in-memory data means you cannot answer "what was the query latency last Tuesday at 14:00?" You can only answer questions about the recent past. For anything requiring longer retention, you need to export Pixie data via the Pixie Plugin API to a backend like Grafana Cloud or Honeycomb — which adds operational complexity.

Second, TLS inspection is protocol-path-dependent. Pixie's TLS visibility uses uprobes on SSL_read/SSL_write in OpenSSL and BoringSSL. If your application uses a Go TLS implementation (the standard crypto/tls package), Pixie cannot see inside the encrypted traffic because Go does not use OpenSSL. This is a hard limitation — not a configuration issue.

Third, there is no distributed trace context propagation. Pixie sees individual request/response pairs at the pod boundary, not end-to-end traces. If you need to understand a request's path across 8 microservices, you need OpenTelemetry with a trace backend. Pixie tells you what is happening at each hop; it does not connect the hops into a trace.

Use Pixie for: rapid incident debugging (what queries is this pod running right now?), initial performance profiling before you write OTel instrumentation, and network dependency discovery. Use OpenTelemetry for production-grade distributed tracing and long-term performance analysis.

How the Three Tools Fit Together

The tools operate at different layers, have different prerequisites, and answer different questions. Overlap is minimal:

Tool	Kernel Layer	Primary Use Case	Sidecar Required	Requires Cilium
Tetragon	Syscall / kernel function	Security events: process exec, file access, network enforcement	No	No
Hubble	Network L4 / L7	Flow visibility, policy debugging, network metrics	No	Yes
Pixie	Application protocol	HTTP/DB performance, auto-instrumented request tracing	No	No

If you are on Cilium, enable Hubble first. It costs almost nothing — a ring buffer inside the already-running Cilium agent. The signal you get (policy drops, flow volume, DNS failures) is immediately actionable and requires no additional infrastructure.

Add Tetragon for security observability. Even with Post-only actions (no enforcement), the ability to see which binaries are being executed inside your containers, which files are being opened, and which unexpected outbound connections are being made is a material improvement over log-based detection. Security teams that previously needed kernel module-based agents or eBPF Falco can get equivalent visibility with a Helm install.

Add Pixie for debugging cycles. Particularly useful during the development and initial production rollout of a new service, when you do not yet have full OpenTelemetry instrumentation in place. Pixie fills the gap: you can see HTTP error rates and slow queries from day one, without modifying application code.

The three tools are additive, not competing. Running all three on a production cluster is common — Tetragon as a security DaemonSet, Hubble as part of Cilium, and Pixie as an on-demand debugging tool (you can install and uninstall Pixie between incidents if you want to avoid the steady-state memory footprint).

Event Pipeline: Shipping Tetragon Events to Your SIEM

Tetragon events exist only in the container's stdout buffer until something reads them. In production, you need those events in your SIEM or log aggregation system — for incident investigation, compliance audit trails, and detection rules that fire on historical patterns.

The simplest approach is FluentBit running as a DaemonSet (which you likely already have), configured to tail Tetragon container logs and forward to Elasticsearch or Loki.

ini

1[INPUT]
2    Name             tail
3    Path             /var/log/containers/tetragon-*.log
4    Parser           docker
5    Tag              tetragon.*
6    Mem_Buf_Limit    5MB
7    Skip_Long_Lines  On
8
9[FILTER]
10    Name             grep
11    Match            tetragon.*
12    Regex            log process_exec|process_kprobe|process_exit
13
14[FILTER]
15    Name             parser
16    Match            tetragon.*
17    Key_Name         log
18    Parser           json
19    Reserve_Data     On
20
21[OUTPUT]
22    Name             es
23    Match            tetragon.*
24    Host             elasticsearch.logging.svc.cluster.local
25    Port             9200
26    Index            tetragon-events
27    Type             _doc
28    Logstash_Format  On
29    Logstash_Prefix  tetragon
30    Retry_Limit      5

For Loki, replace the [OUTPUT] block:

ini

1[OUTPUT]
2    Name             loki
3    Match            tetragon.*
4    Host             loki.logging.svc.cluster.local
5    Port             3100
6    Labels           job=tetragon,node=$NODE_NAME
7    Label_Keys       $process_exec.process.pod_name,$process_exec.process.namespace

The [FILTER] grep step drops log lines that are not Tetragon events — for example, Tetragon's own startup logs and health check output. The process_exec|process_kprobe|process_exit regex matches only the event type fields present in Tetragon's structured JSON.

If you are using Vector instead of FluentBit, the equivalent source is a kubernetes_logs source with a filter transform targeting the Tetragon pod selector, followed by a remap transform to parse the nested JSON and a elasticsearch or loki sink.

One operational note: Tetragon events can be high-volume on busy nodes. A cluster running many short-lived Jobs or CronJobs will generate a process_exec event for every container entrypoint invocation. Apply the matchNamespaces selector in your TracingPolicy to limit event scope to namespaces that matter, or use the matchLabels selector to target specific workloads.

Frequently Asked Questions

Does Tetragon require Cilium as the cluster CNI?

No. Tetragon is an independent DaemonSet that attaches eBPF programs directly to kernel functions using kprobes and tracepoints. It does not depend on Cilium's eBPF maps or datapath. You can run Tetragon on clusters using Flannel, Calico, AWS VPC CNI, or any other CNI. The Cilium project co-maintains Tetragon, but the operational dependency is one-way: Hubble requires Cilium, Tetragon does not.

Does Hubble work with Cilium in CNI chaining mode on EKS?

Yes. EKS with Cilium in chaining mode — where AWS VPC CNI handles IPAM and Cilium runs on top for network policy — still runs the full Cilium agent, and Hubble is a feature of the Cilium agent. Enable it with hubble.enabled=true in your Cilium Helm values. The flow data will be present for all traffic that passes through Cilium's datapath, which in chaining mode covers network policy enforcement and service load-balancing, but not VPC-level routing. You will see inter-pod flows; you will not see flows that bypass Cilium at the AWS network level.

How does Pixie compare to OpenTelemetry auto-instrumentation?

Pixie requires zero configuration for supported protocols: HTTP/1.1, gRPC, PostgreSQL, MySQL, Redis, Kafka, DNS. You install Pixie and immediately see request-level data. OTel auto-instrumentation requires a language-specific agent (the Java agent JAR, the Python opentelemetry-instrument wrapper, the Node.js @opentelemetry/auto-instrumentations-node package) and configuration of an OTEL Collector endpoint. OTel gives you distributed traces with context propagation across service boundaries, long-term retention in a trace backend, and richer span metadata. Pixie gives you 60-minute local retention, no context propagation, and zero application changes. The practical answer: use Pixie for rapid investigation during incidents and service rollouts; use OTel for production-grade distributed tracing and long-term performance baselines.

Can Tetragon replace Falco?

They solve similar problems with different tradeoffs. Falco uses either a kernel module or its own eBPF probe to intercept syscalls and evaluate rules against them. Falco has a large community-maintained rules library (Falco Rules v2.x) and mature integrations with SIEM platforms. Tetragon uses eBPF kprobes and adds one capability Falco does not have: in-kernel enforcement via SIGKILL before a syscall completes. If your threat model requires prevention (not just detection), Tetragon is the right tool. If your threat model is detection-only and you want to start with a rich rules library without writing TracingPolicy YAML, Falco is the lower-friction path. They can coexist on the same cluster — there is no conflict between the two DaemonSets.

What kernel version do these tools require?

Tetragon requires Linux kernel 5.4+ for basic functionality; some TracingPolicy features (like BTF-based type-aware argument extraction) require kernel 5.8+. Hubble is a Cilium component — Cilium 1.14+ requires kernel 4.19.57+ (the minimum for eBPF socket-based load balancing). Pixie requires kernel 4.14+ for basic eBPF support, but TLS visibility via uprobes requires 4.17+. In practice, most managed Kubernetes offerings (EKS 1.29+, GKE 1.28+, AKS 1.28+) ship kernels in the 5.15–6.x range, so kernel version is rarely a blocker on managed clusters.

eBPF Observability: Tetragon, Hubble, and Pixie in Production

Tetragon: Kernel-Level Security Observability

Installing Tetragon

TracingPolicy: The Core Tetragon Resource

Reading Tetragon Events

Exporting Tetragon Events for Persistence

Hubble: Network Flow Observability for Cilium Clusters

Enabling Hubble

Hubble CLI

Practical Hubble Use Cases

Hubble Metrics and Prometheus Alerting

Pixie: Auto-Instrumented Application Performance

Installing Pixie

PxL Scripts: Built-In and Custom

Pixie Limitations You Need to Know

How the Three Tools Fit Together

Event Pipeline: Shipping Tetragon Events to Your SIEM

Frequently Asked Questions

Further Reading

Related Topics

Read Next

Cilium and eBPF: High-Performance Kubernetes Networking

Cilium: Advanced Networking, Security, and Observability on Kubernetes

Service Mesh Comparison: Istio vs Linkerd for Kubernetes