- Total outage time: 4h 38m - All users were unable to access the CyberSmart web dashboard, the CyberSmart applications continued to function but the requests were not handled - The web dashboard was down for all users from 13:48 until 18:26 - All customers requests to the platform resulted in 502 errors - A new feature required an unforeseen time to complete, causing our REST API’s CPU usage to run at critically high levels
Timeline (GMT)
- 13:48 Issue began - 13:49: Staff were notified of the issue - 16:44: Problem found - 18:23: Fix pushed to Production - 18:26: Service Restored
Root Cause
- Our API server handles requests from deployed applications on Desktop and Mobile (in testing) devices. These send an update HTTP POST request containing changes of configuration every 15 minutes. Our new feature which checks application vulnerabilities on a user’s local computer does a ‘lookup’ against our database to check for any new vulnerabilities. This query was taking up to 30 seconds to return (for over 100k requests), which caused huge load on the CPU resource of the REST Server as it continued to try and processes new requests, and evidently caused the server to return 502 errors.
Resolution and recovery
- We have disabled the new feature from the web dashboard which was causing the downtime, and the dashboard is back online and working normally. The feature was not visible to users, so no loss of service will occur.
Corrective and Preventative Measures
- We have planned for these REST API endpoints to be evaluated and refactored for increased efficiency, specifically for database lookup. We aim to have the feature back online and visible in our next public release. - We are dedicating time in the coming weeks to upgrading our monitoring systems to alert us to the location of future problems, and aide us in debugging and testing, in turn allowing us to achieve a quicker resolution time when required.
Posted May 03, 2019 - 14:00 BST
This incident affected: CyberSmart Platform (CyberSmart Apps, CyberSmart Dashboard).