Overview
Between 20:30 and 20:46 UTC on 12 January 2022, WordPress VIP experienced a partial service disruption due to a code change that impacted how HTTP requests are routed within the WordPress VIP infrastructure. As a result, the majority of uncached requests for affected sites were served 503 responses during this time.
Chronology of Events
Date | UTC Time | Update |
12 Jan. 2022 | 20:15 | Code change release causes an internal API to generate incorrect configuration data. |
12 Jan. 2022 | 20:30 | As part of normal operations, VIP routing configurations dynamically update using data from an internal API. The data is incorrect because of the previous update. |
12 Jan. 2022 | 20:30:42 | First failed request is recorded in the logs and internal alerts received. |
12 Jan. 2022 | 20:32 | VIP begins investigation. |
12 Jan. 2022 | 20:40 | VIP identifies the problem. |
12 Jan. 2022 | 20:43 | VIP reverts the offending code and reloads routing configurations. |
12 Jan. 2022 | 20:46:08 | The last failed request resulting from this issue is recorded in our logs. Incident is resolved. |
12 Jan. 2022 | 20:50 | VIP Lobby updated, post-outage process begins. |
What Happened
A code release caused incorrect data to be materialized by an internal API. Our systems use this data to determine how HTTP requests are routed within the WordPress VIP Infrastructure. With incorrect data, our systems were incapable of forwarding traffic to the correct destination, and returned errors to uncached requests on affected sites resulting in HTTP 503 errors.
Remediation
The issue was addressed by reverting the code change that led to incorrect routing configurations and deploying the correct configurations.
Future Prevention
The process for code releases is being reviewed to add additional procedural safeguards. Automated checks are also being investigated to minimize the chance of a similar problem happening in the future.