Nov 13
Tier 1 Data Center Requirements
Maybe its just me, but I expect any tier 1 data center that I or my clients use to be able to handle a power outage, and I mean a full power outage. As you may or may not know, Rackspace experienced a severe outage yesterday. A truck collided with the main power transformer outside their building severing them from the grid. According to their statement, they switched over to their backup power feed, but the power company was turning that feed on and off in order to assist the first responders trying to rescue people from the accident scene - totally understandable. What’s not understandable is why Rackspace didn’t have enough generator capacity to run its chillers as well as the machines, or if they did, why they didn’t execute on that. Anyone who runs equipment rooms knows they must have proper cooling, or the machines located in that room will self destruct in very short order. So, it stands to reason that it does no good to have generator backup in case the power is cut if you can’t run your chillers off of that generator system as well. You’ll just be running your machines literally to death.
As a followup, Rackspace posted an incident report to its users yesterday. In the report was this tidbit relating to the cooling failures:
===
The DFW facility has two separate utility feeds and the engineers decided to start moving from generators to the secondary utility feed. Each time Rackspace alternates between utility and generator power, the chillers require us to follow a shut down and restart procedure. This procedure normally takes approximately 30 minutes to complete. We started the transition to the secondary utility feed and initiated the restart process on the chillers.
Unfortunately, at this point, our utility provider shut down the secondary feed that was powering the data center, without notifying Rackspace. This was an emergency action taken by the utility in order to allow safe removal of the accident vehicle and protect the emergency responders. This unexpected power cut required DFW to switch back to generator power and reinitiate the chiller start up procedure. The repeated cycling of the chillers resulted in increasing temperatures within the data center.
===
So, Rackspace does indeed have enough generator power to cover their HVAC units. Now I understand the requirements for the large chillers in use at these facilities to undergo long shutdown and startup procedures. You can’t just turn these things on and off without damaging them. The same is true for your home, and even small portable, HVAC units. So, perhaps what needs to be re-evaluated is Rackspace’s *processes*. In a case like this, it seems that a sound process would be that, while one of your utility feeds is down, just leave the chillers on generator power so that you avoid the possibility of your secondary feed going out as well, causing the scenario we witnessed here.