UPDATE:
After a morning of stability we can confirm that the issues we have experienced with the unix home directories and silver storage have been resolved. Some users have experienced problems with the unix home exports towindows and mac, but these have been fixed as well.
In the short term, we will we will do a full health check of our file servers with the supplier. This is expected to take place next week.
In the long term, we will be reviewing the current data storage strategy and looking into more reliable and robust options.
By the end of this week we will have a review meeting of this incident, after which we will send out more technical information.
Thank you all for your patience and apologies again for the disruption.
The intermittent issues affecting Unix home directories and Silver Storage are still ongoing. We are still investigating the problem with the supplier who has made fixing this problem their priority.
All Academic Computing (ACT) services are affected by this.
The issues stems from the Operating System (AOS) causing a system crash (specifically a crash on the ‘server cluster nodes).
This is not directly related to file server activity, however the frequency of the system crashes does seem to be related to the general workload. It was not so frequent or affecting so many of the nodes over the weekend as it is today, as the amount of work was lighter over the weekend.
To help decrease the load on the file server we have disabled access to the Unix home directories from Windows. This will also include disabling access to the personal and group web page files as well as to some research volumes from windows.
We do not have a timeline for the fix yet but check this blog and the IT Status Page for further updates
Apologies again for the disruption, we are doing our best to resolve this as soon as possible.