No issues detected
Compute   (?) Grex compute nodes and batch jobs
Compute nodes Operational
Access   (?) Grex login nodes and external network
Login nodes Operational
Network Operational
Storage   (?) Grex filesystems
Software   (?) ComputeCanada and local Grex software stacks
Incidents Grex Documentation UManitoba IT Service Catalogue

2023 (4)

December 18, 2023 at 9:00 AM UTC

Planned Grex outage for SLURM and minor OS update

Resolved after 33h 0m of downtime
September 12, 2023 at 6:00 PM UTC

Two Planned Grex outages for HPCC transformer work

Resolved after 171h 0m of downtime
February 22, 2023 at 2:30 PM UTC

Planned Grex outage for HPCC work and storage update

Resolved after 108h 30m of downtime
January 26, 2023 at 2:50 PM UTC

Electric failure, loss of power to management rack

Resolved after 3h 10m of downtime

2022 (11)

December 1, 2022 at 11:30 AM UTC

Upcoming cooling outage in HPCC  ℹ

UPDATED NOTICE about outage on Dex 7-8 2022 (updated on Dec 1 at 3:00 PM): Unplanned outage: Dec 7/22 at 5:30pm until Dec 8/22 at 5:00am Physical Plant scheduled with a short notice a maintenance of …
October 20, 2022 at 9:30 AM UTC

Datacentre Plumbing works October 20, 2022

Resolved in under a minute
October 13, 2022 at 8:30 AM UTC

Upcoming Datacentre Work on Oct 13.  ℹ

Update: The outage was rescheduled to October 20, 2022 The outage planned to happen on Oct 13, was rescheduled to October 20. IMPORTANT NOTICE: MAINTENANCE OF POWER AND COOLING SYSTEMS - October 13, …
September 9, 2022 at 11:45 AM UTC

Planned Network outage of the login nodes

Resolved in under a minute
September 2, 2022 at 8:30 AM UTC

Hardware failure on one of the login nodes

Resolved in under a minute
August 16, 2022 at 9:00 AM UTC

CANARIE outage affects Grex external network and Legacy login nodes

Resolved after 5h 0m of downtime
June 23, 2022 at 7:00 PM UTC

Grex has a problem with external network and login nodes

Resolved after 14h 30m of downtime
May 31, 2022 at 1:30 AM UTC

Brief power outage on May 31, 2022

Resolved after 30m of downtime
May 13, 2022 at 7:30 PM UTC

One of the /global/scratch filesystem servers crashed, FS runs in degraded mode

Resolved after 2h 30m of downtime
May 9, 2022 at 12:00 AM UTC

Emergency SLURM update wiped all running jobs

Resolved after 11h 0m of downtime
April 24, 2022 at 11:30 AM UTC

Brief power outages on April 23, 24 , 2022

Resolved after 1h 30m of downtime

2021 (5)

October 23, 2021 at 3:30 PM UTC

Brief power outages on October 23, 2021

Resolved after 9h 30m of downtime
August 31, 2021 at 8:00 AM UTC

Planned power outage Aug 31 to Sept 1

Resolved after 28h 0m of downtime
July 28, 2021 at 6:05 PM UTC

Grex update outage, New Status webpage  ℹ

NOTICE: Grex is online for production. After the last outage, Grex was in production in a test mode for a week since June 9, 2021. We did not get any reports about the new hardware and software. …
July 27, 2021 at 10:00 AM UTC

Rebooting the nodes

Resolved after 243h 30m of downtime
June 9, 2021 at 11:35 AM UTC

Westgrid network failure

Resolved after 1176h 35m of downtime

0001 (1)

January 1, 0001 at 12:00 AM UTC

Unplanned power outage in HPCC datacentre

Resolved after 17728065h 30m of downtime