- Total Outage time: ~2.5m hours - All users were unable to access the CyberSmart Web platform due to a 3rd party component failure. - All customers and application HTTP requests to the platform resulted in 502 errors - A third party hosting/services company (Amazon Web Services) experienced an outage in which we have a number of key infrastructure components hosted with.
Timeline (GMT)
- 16:33 Issue Began - 16:50 Staff were notified of the issue - 19:00: Issue resolved (by external service provider) - 19:03: CyberSmart platform back online
Root Cause
Amazon AWS had issues with a few of there platform infrastructure services including degraded performance for EBS volumes within the “EU-WEST-2”Region, which is a key part of the RDS component CyberSmart uses for data storage.
Resolution and recovery
N/A
Corrective and Preventative Measures
We have planned a work-stream for improved failover within CyberSmart, including using PaaS services distributed over different geographical regions. This will allow automatic corrective measures to keep our services online when a given region has issues.
Posted Aug 07, 2019 - 11:01 BST
This incident affected: CyberSmart Platform (CyberSmart Apps, CyberSmart Dashboard).