Skip to main content

Monitoring & Observability

Monitoring and observability are critical disciplines in Linux system administration. A modern system generates continuous signals through processes, services, disk activity, and network traffic. Monitoring provides visibility into key metrics such as CPU utilization, memory consumption, I/O throughput, and network latency. Observability extends this by enabling administrators to correlate metrics, logs, and traces to understand the root cause of anomalies.

In practice, monitoring answers “What is happening?” while observability answers “Why is it happening?”. Together, they form the foundation for performance optimization, reliability engineering, and proactive incident response in Linux environments.


Monitoring vs Observability

  • Monitoring → Watching specific metrics (CPU, memory, disk, network).
  • Observability → Understanding why something happens by correlating logs, metrics, and traces.
Monitoring is like watching the castle gates. Observability is like reading the kingdom’s chronicles and whispers to understand deeper causes.

Core Monitoring Tools

Hackers must learn the commands to watch the kingdom’s health:

Command Magical metaphor
top / htop Real-time potion bubbling (process activity)
vmstat Balance of CPU, memory, and I/O (kingdom’s energy flow)
sar Historical performance records (oracle’s memory)
iostat Disk I/O monitoring (vault activity)
free -m Memory usage (treasury reserves)
uptime How long the kingdom has been awake
htop → Crystal ball showing live bubbling cauldrons sar → Oracle’s diary of past events iostat → Vault guardians reporting treasure movement

Observability Tools

Beyond metrics, observability requires deeper insight:

  • Logs: Chronicles of events (/var/log/syslog, journalctl).
  • Metrics: Quantitative measures (CPU %, memory usage, request counts).
  • Traces: Path of a request through multiple services (like following a messenger across kingdoms).

Modern Tools:

  • Nagios → Guardian that alerts when thresholds are crossed.
  • Prometheus → Collector of metrics, storing them for analysis.
  • Grafana → Artist that paints metrics into visual dashboards.
  • ELK Stack (Elasticsearch, Logstash, Kibana) → Chronicles turned into searchable, visual insights.

Real-World Applications

  • System Health: Detect CPU spikes before they crash servers
  • Security: Spot repeated failed login attempts in logs
  • Performance: Monitor disk I/O bottlenecks
  • Cloud: Use Prometheus + Grafana to visualize container metrics in Kubernetes

Hackers Hint:

  • Prometheus is the scribe collecting data scrolls.
  • Grafana is the artist painting visions on the crystal ball.
  • ELK Stack is the library of chronicles, searchable by scholars.

Practical Exercises

  1. Run htop and observe top processes
  2. Use vmstat 2 to monitor system activity every 2 seconds
  3. Install sysstat and run sar to view historical CPU usage
  4. Configure ufw logs and monitor failed login attempts
  5. Install Prometheus + Grafana and create a dashboard showing CPU and memory usage

Hackers Quest - Mini Project

Create an Oracle’s Crystal Ball Dashboard:

  • Install Prometheus and Grafana
  • Collect CPU, memory, and disk metrics
  • Build a dashboard that visualizes system health
  • Document findings: “What patterns did the crystal ball reveal about the kingdom’s heartbeat?”

Hackers Notebook

Monitoring and observability are not optional - they are core operational practices. Without monitoring, administrators cannot detect resource saturation or service failures in time. Without observability, troubleshooting becomes guesswork, slowing down incident resolution.


Tips, Tricks, Roadmaps, Resources, Networking, Motivation, Guidance, and Cool Stuff ♥

Updated on Dec 28, 2025