There are several options available to monitor the cluster. You can access all available options by visiting https://palma.uni-muenster.de with your browser.

Grafana

Grafana ist actually just a general-purpose web frontend to visualize different metrics. We are using it to visualize data collected in a Prometheus TSDB (time series data-base). The different graphs/panels are grouped into dashboards and were created by ourselves to give on overview of the overall usage of the cluster, individual nodes, GPUs as well as different Slurm metrics. Users can login and also create their own dashboards to visualize data if wanted. If you have any suggestion how to improve the visualization of different metrics, feel free to contact us! Metrics are currently stored for 90 days.

https://palma.uni-muenster.de/grafana

XDMoD

XDMoD collects data directly from the slurm database. Here you can find statistics of the overall cluster usage but also for individual groups or users over the last couple of years. Metrics include number of jobs, cpu-h, wait times etc.

https://palma.uni-muenster.de/xdmod

  • Keine Stichwörter