Part ofLinux Foundations·Step 3 of 3

General

Linux Commands for Advanced Engineers: Debugging, systemd & Kernel Internals

Advanced90 min to complete25 min read

Go deep — strace, lsof, tcpdump, systemd units, cgroup and namespace primitives, kernel parameter tuning, and shell scripting patterns for production-grade Linux engineering.

Before you begin

Solid intermediate Linux skills (pipes, processes, SSH, networking)
Basic shell scripting (loops, variables, conditionals)
A Linux system — not macOS (several tools here are Linux-only)

Linux

systemd

Performance Tuning

Debugging

Containers

Kernel

cgroups

Namespaces

Linux Commands for Advanced Engineers: Debugging, systemd & Kernel Internals

Intermediate Linux gets you productive. Advanced Linux gets you dangerous — in the good way. This tutorial covers the tools that let you see exactly what a process is doing at the system level, configure Linux as a service runtime, tune kernel parameters for production workloads, and understand the primitives that containers are built on.

This is the knowledge that separates engineers who can fix a hung Kubernetes node at 3am from engineers who can't.

1. Syscall Tracing — `strace`

Every interaction a process has with the kernel (reading files, opening sockets, allocating memory) is a system call. strace shows you all of them in real time.

bash

strace ls                                  # Trace syscalls made by ls
strace -p 1234                             # Attach to running process by PID
strace -e openat,read,write ls             # Filter to specific syscalls
strace -c ls                               # Summary: count and time per syscall
strace -f -p 1234                          # Follow forked children too
strace -o /tmp/trace.txt -p 1234           # Write output to file

What to look for

openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0...", 832) = 832

Common patterns:

ENOENT (No such file or directory) — missing config file, broken library path
EACCES (Permission denied) — file permission issue
EAGAIN / EWOULDBLOCK — non-blocking I/O waiting for data
Lots of futex calls — thread synchronisation (can indicate lock contention)

Practical use: process exits immediately with no output, no logs, exit code 1. strace -e openat,stat <cmd> often shows exactly which file it's trying and failing to read.

2. Open File Descriptors — `lsof`

lsof (list open files) shows everything a process has open: files, sockets, pipes, devices.

bash

1lsof -p 1234                               # All open files for PID 1234
2lsof -u ajeet                              # All files opened by user ajeet
3lsof -i :8080                              # What process is using port 8080
4lsof -i TCP                                # All TCP connections
5lsof -i TCP:8080-9000                      # Range of ports
6lsof +D /var/log/                          # All processes with files open in /var/log/
7lsof /var/log/app.log                      # What's currently writing to app.log

`fuser` — simpler port/file queries

bash

fuser 8080/tcp                             # PID using TCP port 8080
fuser -k 8080/tcp                          # Kill the process using port 8080
fuser /mnt/disk                            # What's preventing unmount

fuser /mnt/disk is the first thing to run when umount says "device is busy."

3. Network Packet Capture — `tcpdump`

tcpdump captures raw network packets. Essential for debugging TLS issues, unexpected traffic, and misbehaving services.

bash

1tcpdump -i eth0                            # Capture all traffic on eth0
2tcpdump -i any port 80                     # HTTP traffic on any interface
3tcpdump -i eth0 host 10.0.0.5             # Traffic to/from a specific host
4tcpdump -i eth0 'tcp port 443 and host 10.0.0.5'
5tcpdump -i eth0 -w /tmp/capture.pcap       # Write to file for Wireshark
6tcpdump -i eth0 -c 100                     # Capture only 100 packets
7tcpdump -i eth0 -nn                        # Don't resolve IPs or ports to names

-nn makes output readable for IPs and ports in production. -w captures to a file you can open in Wireshark for deep inspection.

IP routing and interfaces

bash

ip addr                                    # All interfaces and their IPs
ip addr show eth0                          # Specific interface
ip route                                   # Routing table
ip route get 8.8.8.8                       # Which route would traffic to 8.8.8.8 take?
ip link set eth0 up                        # Bring interface up
ip neigh                                   # ARP table

4. Disk and Storage

bash

lsblk                                      # Block devices tree
lsblk -f                                   # Include filesystem types and UUIDs
fdisk -l                                   # Partition tables
blkid                                      # UUIDs and filesystem types

Mount and unmount

bash

mount /dev/sdb1 /mnt/data                  # Mount device
mount -t nfs 10.0.0.5:/exports /mnt/nfs    # Mount NFS share
umount /mnt/data                           # Unmount (fails if in use — use fuser first)
mount | grep sdb                           # See current mounts
cat /proc/mounts                           # All current mounts (including virtual)

Inodes and hard/soft links

bash

ls -i file.txt                             # Show inode number
stat file.txt                              # Full file metadata including inode
df -i                                      # Inode usage (can fill up before disk space does)

ln source.txt hardlink.txt                 # Hard link (same inode, same data)
ln -s /abs/path/to/source symlink.txt      # Symbolic link (pointer to path)

Hard links: two directory entries pointing to the same inode. Deleting one doesn't delete the data until all hard links are removed. Symlinks point to a path — if the target moves, the symlink breaks.

"Inode exhaustion" (df -i shows 100%) is a real production failure mode — many small files (npm node_modules, log files) can exhaust inodes before disk space.

5. User and Group Management

bash

1useradd -m -s /bin/bash deploy             # Create user with home dir and bash shell
2useradd -r -s /usr/sbin/nologin appuser    # System user, no login shell
3usermod -aG docker ajeet                   # Add ajeet to docker group
4usermod -aG sudo ajeet                     # Grant sudo access
5id ajeet                                   # Show UIDs, GIDs, groups
6passwd ajeet                               # Set password
7userdel -r olduser                         # Delete user and home directory
8groupadd appgroup                          # Create a group

`sudo` and sudoers

bash

sudo command                               # Run command as root
sudo -u postgres psql                      # Run as a different user
sudo -i                                    # Interactive root shell
sudo !!                                    # Re-run last command with sudo

Edit sudoers safely with visudo (validates syntax before saving):

bash

visudo

Common sudoers entries:

# Allow ajeet to run all commands without password
ajeet ALL=(ALL) NOPASSWD: ALL

# Allow deploy user to restart nginx only
deploy ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart nginx

6. File Permissions Deep Dive

The standard permission string rwxr-xr-- breaks down as:

rwx — owner (read, write, execute)
r-x — group (read, no write, execute)
r-- — others (read only)

Octal notation

bash

chmod 755 script.sh        # rwxr-xr-x (owner full, group/others read+exec)
chmod 644 config.yaml      # rw-r--r-- (owner read+write, others read)
chmod 600 ~/.ssh/id_rsa    # rw------- (private key: owner only)
chmod 700 ~/.ssh           # rwx------ (SSH dir: owner only)

Special bits

bash

chmod u+s /usr/bin/passwd  # Setuid — runs as file owner regardless of caller
chmod g+s /shared/dir      # Setgid — new files inherit group of directory
chmod +t /tmp              # Sticky bit — only owner can delete their files

Check special bits with ls -l — an s in owner execute position = setuid, s in group execute = setgid, t in others execute = sticky.

bash

ls -la /usr/bin/passwd
# -rwsr-xr-x  1 root root  passwd
# ↑ 's' in owner execute = setuid (allows any user to change their own password)

7. `systemd` — Managing Services

systemd is the init system and service manager on nearly all modern Linux distributions (Debian, Ubuntu, RHEL, Fedora, Amazon Linux 2+).

Essential commands

bash

1systemctl status nginx                     # Service status
2systemctl start nginx                      # Start
3systemctl stop nginx                       # Stop
4systemctl restart nginx                    # Restart
5systemctl reload nginx                     # Reload config without restart (if supported)
6systemctl enable nginx                     # Start on boot
7systemctl disable nginx                    # Don't start on boot
8systemctl list-units --type=service        # List all services
9systemctl list-units --failed              # Show failed services

Logs with `journalctl`

bash

1journalctl -u nginx                        # All logs for nginx
2journalctl -u nginx -f                     # Follow live
3journalctl -u nginx --since "1 hour ago"
4journalctl -u nginx --since "2026-06-01 10:00" --until "2026-06-01 11:00"
5journalctl -p err -u nginx                 # Error-level and above only
6journalctl --disk-usage                    # How much space logs are using
7journalctl --vacuum-size=500M              # Trim logs to 500MB

Writing a service unit

Service unit files live in /etc/systemd/system/. Here's a minimal one:

ini

1# /etc/systemd/system/myapp.service
2[Unit]
3Description=My Application
4After=network.target
5Wants=network.target
6
7[Service]
8Type=simple
9User=deploy
10WorkingDirectory=/opt/myapp
11ExecStart=/opt/myapp/bin/server --port=8080
12Restart=on-failure
13RestartSec=5
14StandardOutput=journal
15StandardError=journal
16Environment=NODE_ENV=production
17EnvironmentFile=/opt/myapp/.env
18
19[Install]
20WantedBy=multi-user.target

bash

systemctl daemon-reload                    # Required after editing unit files
systemctl enable --now myapp               # Enable and start immediately
journalctl -u myapp -f                     # Watch logs

Key Restart= values: no (never), on-failure (on non-zero exit), always (always restart).

8. Performance Tuning

`ulimit` — per-process limits

bash

ulimit -n                                  # Current open file limit (1024 on older systems; modern distros default to 1048576)
ulimit -n 65536                            # Raise limit for current shell session
ulimit -a                                  # Show all limits

For services: set in the systemd unit file:

ini

[Service]
LimitNOFILE=65536
LimitNPROC=4096

`sysctl` — kernel parameters

bash

sysctl -a                                  # All kernel parameters
sysctl net.core.somaxconn                  # Read a parameter
sysctl -w net.core.somaxconn=65535         # Write (temporary, until reboot)

To persist across reboots, write to /etc/sysctl.conf or /etc/sysctl.d/99-custom.conf:

bash

echo "net.core.somaxconn = 65535" >> /etc/sysctl.d/99-custom.conf
sysctl -p /etc/sysctl.d/99-custom.conf     # Apply immediately

Common production tuning parameters

# TCP connection backlog (important for high-traffic servers)
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535

# File descriptors (affects all processes)
fs.file-max = 2097152

# Time-wait sockets (reduce TIME_WAIT accumulation)
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15

# VM memory overcommit (important for Redis, Java apps)
vm.overcommit_memory = 1

# Reduce swappiness to near-zero (Kubernetes nodes also require swapoff -a to fully disable swap)
vm.swappiness = 1

`perf` — CPU profiling

bash

perf stat ls                               # CPU counter summary for a command
perf top                                   # Live CPU usage by function
perf record -p 1234 -g sleep 30           # 30-second profile of process
perf report                                # Analyse recorded data

9. Container Primitives — Namespaces and cgroups

Containers are not magic. They're Linux namespaces (isolation) and cgroups (resource limits) combined.

Namespaces

bash

1# List all namespaces for a process
2ls -la /proc/1234/ns/
3
4# Run a command in an isolated PID namespace (like a container)
5unshare --pid --fork --mount-proc bash
6
7# Enter the namespace of a running container/process
8nsenter -t 1234 --net --pid bash
9
10# See which namespace a process is in
11ls -la /proc/1234/ns/net

nsenter is invaluable for debugging containers — it puts you inside the network and PID namespace of a running pod without needing docker exec or kubectl exec.

cgroup v2

bash

1# Check if you're on cgroup v2
2mount | grep cgroup2
3cat /sys/fs/cgroup/cgroup.controllers
4
5# See what a process belongs to
6cat /proc/1234/cgroup
7
8# Memory limit for a cgroup
9cat /sys/fs/cgroup/system.slice/myapp.service/memory.max
10
11# CPU quota (100000 = 1 CPU, 200000 = 2 CPUs per 100ms period)
12cat /sys/fs/cgroup/system.slice/myapp.service/cpu.max

When a container is OOMKilled, it's the cgroup memory limit enforced by the kernel. When you set resources.limits.cpu in Kubernetes, it's a cgroup CPU quota. Understanding cgroups means understanding why Kubernetes resource limits work the way they do.

10. Shell Scripting for Production

Defensive defaults

Always start scripts with:

bash

#!/usr/bin/env bash
set -euo pipefail

set -e — exit immediately if any command fails
set -u — treat unset variables as errors
set -o pipefail — a pipe fails if any command in it fails (without this, false | true succeeds)

Functions and error handling

bash

1#!/usr/bin/env bash
2set -euo pipefail
3
4log() {
5  echo "[$(date '+%Y-%m-%dT%H:%M:%S')] $*"
6}
7
8die() {
9  log "ERROR: $*" >&2
10  exit 1
11}
12
13cleanup() {
14  log "Cleaning up..."
15  rm -f /tmp/deploy.lock
16}
17trap cleanup EXIT   # runs cleanup() on exit, even on error
18
19[[ -f /tmp/deploy.lock ]] && die "Deploy already in progress"
20touch /tmp/deploy.lock
21
22log "Starting deploy..."

Looping over files

bash

1for file in *.yaml; do
2  echo "Processing $file"
3  kubectl apply -f "$file"
4done
5
6# Loop with array
7services=("api" "worker" "scheduler")
8for svc in "${services[@]}"; do
9  kubectl rollout restart deployment/"$svc"
10done

Checking exit codes

bash

1if kubectl get pod "$POD" &>/dev/null; then
2  echo "Pod exists"
3else
4  echo "Pod not found"
5fi
6
7# Retry pattern
8for i in {1..5}; do
9  curl -sf https://api.internal/health && break
10  echo "Attempt $i failed, retrying..."
11  sleep 5
12done

Here documents

bash

1kubectl apply -f - <<EOF
2apiVersion: v1
3kind: ConfigMap
4metadata:
5  name: app-config
6data:
7  DB_HOST: "${DB_HOST}"
8  APP_ENV: production
9EOF

Where to Go From Here

This tutorial completes the Linux Foundations learning path:

Beginner — Filesystem, basic file ops, grep, find
Intermediate — Pipes, text processing, processes, SSH, cron
Advanced (this tutorial) — strace, systemd, cgroups, kernel tuning, production scripting

With these foundations you're ready for Stage 2 of the Platform Engineering Roadmap — containers and Docker — where namespaces and cgroups that you just learned about become the building blocks of everything you'll run in production.

We built Podscape to simplify Kubernetes workflows like this — logs, events, and cluster state in one interface, without switching tools.

Struggling with this in production?

We help teams fix these exact issues. Our engineers have deployed these patterns across production environments at scale.

Get Expert Help View Services

Continue learning

BeginnerLinux Commands for Beginners: Your First 30 CommandsLearn the 30 Linux commands every engineer uses daily. Covers filesystem navigation, file manipulation, search, and getting help — no prior terminal experience needed.Start IntermediateLinux Commands for Intermediate Users: Processes, Pipes & NetworkingGo beyond navigation — master pipes, text processing with sed and awk, process management, SSH, cron, and essential networking commands used daily in production environments.Start BeginnerDocker Fundamentals: Images, Containers, Volumes & NetworkingLearn Docker from scratch — how images and containers work, writing Dockerfiles, managing volumes and networks, and the commands you'll use every day in a production engineering role.Start

Go deeper

Kubernetes User Namespaces: How Rootless Isolation Actually WorksUser namespaces map container UIDs to unprivileged host UIDs, so a process running as root inside a container runs as UID 100000 on the host. This post breaks down the kernel mechanics, what hostUsers: false changes in practice, and where the current limitations bite.

14 min

The Daemon is Dead: Why Podman is Winning the Security BattleDocker vs Podman. Comparing the industry standard container engine with the daemonless, rootless, and security-focused alternative.

10 min

Kubernetes PodSecurityContext vs SecurityContext: Which One AppliesBoth PodSecurityContext and SecurityContext control Linux security settings in Kubernetes — but they apply at different scopes. Get the scope wrong and your security settings either silently don't apply or get overridden by something you didn't expect.

7 min

Linux Commands for Advanced Engineers: Debugging, systemd & Kernel Internals

Before you begin

Linux Commands for Advanced Engineers: Debugging, systemd & Kernel Internals

1. Syscall Tracing — strace

What to look for

2. Open File Descriptors — lsof

fuser — simpler port/file queries

3. Network Packet Capture — tcpdump

IP routing and interfaces

4. Disk and Storage

Mount and unmount

Inodes and hard/soft links

5. User and Group Management

sudo and sudoers

6. File Permissions Deep Dive

Octal notation

Special bits

7. systemd — Managing Services

Essential commands

Logs with journalctl

Writing a service unit

8. Performance Tuning

ulimit — per-process limits

sysctl — kernel parameters

Common production tuning parameters

perf — CPU profiling

9. Container Primitives — Namespaces and cgroups

Namespaces

cgroup v2

10. Shell Scripting for Production

Defensive defaults

Functions and error handling

Looping over files

Checking exit codes

Here documents

Where to Go From Here

Struggling with this in production?

Continue learning

Go deeper

1. Syscall Tracing — `strace`

2. Open File Descriptors — `lsof`

`fuser` — simpler port/file queries

3. Network Packet Capture — `tcpdump`

`sudo` and sudoers

7. `systemd` — Managing Services

Logs with `journalctl`

`ulimit` — per-process limits

`sysctl` — kernel parameters

`perf` — CPU profiling