← Go back to Status of the Grex HPC system

Emergency SLURM update wiped all running jobs

May 9, 2022 at 12:00 AM UTC

Compute nodes

Resolved after 11h 0m of downtime. May 9, 2022 at 11:00 AM UTC

SLURM security update

SLURM scheduler’s authors announced a severe security vulnerability, and dropped old SLURM versions from support at the same time. This forced us to upgrade SLURM version 19 we used to run, to the supported version 21, immediately. Unfortunately, the SLURM state got corrupted during the update, and all running jobs were lost. We apologize for the inconvenience.

The system is expected to run normally with the new SLURM. If you notice any anomalies, please contact us at support@computecanada.ca , mentioning Grex in the subject line.

Last updated: May 9, 2022 at 3:43 PM UTC