# Monitoring

## Overview

The lab uses a self-hosted monitoring stack to track CPU, GPU, memory, disk, network, and per-process resource usage across all lab servers. Metrics are visualised in Grafana, which is available at [https://grafana.lab.pyarelal.xyz](https://grafana.lab.pyarelal.xyz). Log in with your lab account via the **Sign in with Kanidm** button.

## What is monitored

- CPU usage (by type: user, system, iowait, etc.)
- RAM usage (used, cached, buffers)
- Network traffic (sent and received)
- Disk I/O (read and write)
- GPU utilisation, memory, temperature, and power draw (on GPU-equipped hosts)
- Top processes by CPU and memory

## Monitored hosts

<table id="bkmrk-hostgpu-monitoringor"><thead><tr><th>Host</th><th>GPU monitoring</th></tr></thead><tbody><tr><td>orca</td><td>Yes (NVIDIA)</td></tr><tr><td>kraken</td><td>Yes (NVIDIA)</td></tr><tr><td>leviathan</td><td>Yes (NVIDIA)</td></tr><tr><td>starfish</td><td>No</td></tr><tr><td>eel</td><td>No</td></tr></tbody></table>

## Using the dashboard

After logging in, open the **Infrastructure Overview** dashboard. Use the **Host** dropdown at the top to switch between servers. The time range selector in the top right controls how far back the graphs show.

The dashboard is divided into three sections:

- **System** — CPU, RAM, network, and disk panels visible for all hosts
- **GPU** — GPU panels, populated only for GPU-equipped hosts
- **Processes** — top 10 processes by CPU and memory usage