Summary of UK-LON1 compute node crashes
Incident Report for UpCloud
Resolved
This incident has been resolved.
Posted 3 months ago. Jun 27, 2019 - 12:00 UTC
Monitoring
All the cloud servers in UK-LON1 have now been migrated to more stable platforms. We had to restart few individual servers had as they encountered unexpected errors during the maintenance. We apologize the inconvenience.
If your server is still experiencing any issues please first check server's status with Control Panel's console and if necessary try to apply shutdown and restart. If the problem persists please contact our support.
Posted 3 months ago. Jun 27, 2019 - 00:28 UTC
Update
About 67% of the maintenance is done.
Posted 3 months ago. Jun 26, 2019 - 22:50 UTC
Update
About 67% of the maintenance is done. If you have any servers in that DC that are not behaving as expected it is safe to give a full shutdown/restart request from our control panel. This will not affect our work, and in most cases, the shutdown/restart can fix any issues that might be related to this maintenance.
Posted 3 months ago. Jun 26, 2019 - 15:29 UTC
Identified
Since June 13th, we have had multiple incidents of compute node crashes in our London data centre. We would like to give some insight behind these, as we owe our users an explanation for this unusual situation.

Earlier this year in May, a new Intel CPU vulnerability was discovered (https://upcloud.com/blog/mds-vulnerabilities/), which prompted us to swiftly run security updates throughout our infrastructure to protect our users data. This is something we have had to do in the past for similar Intel CPU vulnerabilities such as Meltdown (https://upcloud.com/blog/intel-cpu-vulnerability-meltdown/), and which we did with great success.

Around the same time, we also rolled out new versions of our infrastructure software, in preparation of new features and products that we are currently working on. Unfortunately, as we discovered, the combination of certain hardware in the London data centre, our new software features, together with the Intel CPU vulnerability mitigations, proved to cause instability over time in specific situations.

UK-LON1 has been the only data centre to experience these situations and we have been working tirelessly to identify the root cause and administer a fix. We are confident that we have identified the root cause at this point, and have already begun rolling out updates. These updates require no actions on our users part, and will not cause any disturbance to your services at UpCloud.

The rollout is estimated to conclude within the next 36 hours, by the end of Thursday, 27th of June, 2019.

We will post any future updates here, so please make sure to subscribe or check back for any updates.

We are truly sorry for the loss of service our users have endured.

Antti Vilpponen
CEO
Posted 3 months ago. Jun 26, 2019 - 11:49 UTC
This incident affected: UK-LON1: Virtualization Hosts.