Intermittent slowness and downtime [RESOLVED]

An update to our previous post about some short-lived issues we had earlier today which have since been resolved.

A summary of events can be found on our status page. The only additional note is that shortly after the switch failure, around 10:45AM Central Time, we took measures to send all VIP traffic – both front-end and dashboard, to our other 2 data centers which were mostly unaffected by this problem. VIP Traffic was not returned to San Antonio until after the new switch was confirmed working and load tested, which was around 12:30PM.

Network issues (resolved)

Yesterday, at approximately 5:50PM Central time, our hosting provider that provides connectivity in our Atlanta and Dallas hosting facilities made a manual routing change in error which affected connectivity to hundreds of servers in those facilities. As a result, we put WordPress.com in read-only mode for about 12 minutes while we assessed and resolved the situation. Most visitor traffic was not affected by this outage. By 6:20 PM everything had returned to fully read-write, operational status.

Network maintenance, Oct. 3, 2010 0000 – 0600 CDT (non disruptive)

Sorry for the short notice. Our upstream network partner in our Dallas hosting facility will be performing maintenance this evening beginning in about 4 hours (midnight Central Time). The purpose of this maintenance is to add additional network capacity and an additional upstream bandwidth provider which will increase redundancy and diversification. No downtime is expected as a result of this maintenance, but we will be on hand in case anything does not go exactly as planned.

UPDATE: This maintenance was completed successfully.

Emergency Power Maintenance – Sept 25, 10:00 PM Pacific time

To fix some faulty power hardware that failed yesterday evening, our datacenter provider is performing emergency maintenance on some power equipment in our Dallas facility this evening between 10PM PT and 4AM PT Sept 25th – Sept 26th.  The nature of this maintenance will require that all servers connected to the faulty power equipment be powered down prior to the maintenance beginning. This is to ensure the safety of the electricians working on the equipment.  Unfortunately, many mission critical WordPress.com servers will be affected. We are currently making the necessary preparations to minimize the impact of the maintenance, but once the maintenance begins, depending on how the power transfer proceeds, there may be a partial service interruption.  We have additional staff on call to deal with any possible issues that may arise.  We expect this maintenance to take no longer than 30 minutes to complete, although the window is 10PM – 4AM, we hope to be finished by 11PM Pacific time.

Once the maintenance is complete, we will update this post with additional details.  Sorry for the short notice, but unfortunately it was unavoidable in this case.

UPDATE: The maintenance has been completed successfully. Between 11:00PM and 11:30PM, WordPress.com was in a global read-only mode so sites were viewable, but no new content could be posted. By 11:33PM Pacific time, everything was fully operational.