Wednesday, April 29, 2009

Human Side of Higher Data Center Temperatures

With all the talk of hotter data center temperatures, one item that has often been overlooked is what happens to the poor soul tasked with going in and servicing equipment in that data center. Imagine having to work in a facility at 40°C (104°F) for several hours at a time--and that's at the equipment input. The exhaust temperature on the back side of the rack could easily be 55°C (131°F).

One approach is to adopt a "fail in place" model where technicians never go into a production facility, but even Google has technicians adding and replacing individual servers in their containerized data centers.

Other approaches to consider:
  • Localized spot cooling. A very small air conditioner could take the edge off the area in front of a rack.
  • Perform service operations at night or when it's reasonably cool.

This last suggestion may seem too simplistic at first, but it's actually quite practical. In a facility with sufficient redundancy to ensure high availability, server replacement should be able to wait up to 24 hours. Operating a data center at consistently high temperatures will end up increasing power consumption in the IT equipment. It only makes sense to use higher temperatures in a data center when using optimizers to eliminate or substantially reduce HVAC CapEx and OpEx costs.

If a data center is using economizers, the temperature in the data center should drop when the outside temperature drops. Even in relatively warm areas during summer months, there are substantial times each day where the temperature drops to reasonable levels in which technicians can comfortably work.

--kb

Monday, April 13, 2009

NEBS vs. the Hottest Place on Earth

As mentioned in Higher Temperatures for Data Center and Processors for Higher Temps, various groups are pushing for higher and higher ambient temperatures in data centers. At Google's Efficient Data Center Summit last week, Amazon's James Hamilton brought up an interesting point in his slides and blog about ambient temperatures:
the hottest place on earth over recorded history was Al Aziziyah Libya in 1922 where 136F (58C) was indicated

James went on to note during his talk that telecommunications equipment designed to the NEBS (Network Equipment Building System) standards routinely has to handle temperatures up to 40°C.

Actually, the story is better than that. NEBS-GR-63 (the key NEBS specification dealing with environmental conditions for equipment in telecommunications central offices) requires equipment to handle 40°C long-term ambient temperatures, but telecommunications equipment certified at the shelf (chassis) level needs to be able to operate at 55°C ambient for up to 96 hours at a time and up to 360 hours per year [the 360 hours is for reliability calculations]. This means that much of the NEBS-rated equipment for data centers can operate at temperatures that are only 3°C lower than the highest natural temperature ever recorded on Earth, as noted by James.

Given the common engineering penchant to provide some guardband on products vs. the official specifications, even a 58°C ambient is not out of the question. This means that NEBS-rated equipment could be good candidates for data centers operating at high temperatures.

But can you get decent performance in NEBS-rated servers? Yes! For example, vendors such as Radisys, Kontron, and Emerson have announced blade servers with Intel's new 5500 (aka "Nehalem") processors, and their bladed servers commonly are NEBS certified to operate at 55°C. This would allow the latest server technology to operate in the most demanding environments.

--kb

Thursday, April 9, 2009

More on Google's Battery-backed Servers

As noted in Evaluating Google's Battery-backed Server Approach, there are a number of benefits to Google's recently-disclosed practice of putting VRLA batteries on every server, but there are quite a few drawbacks as well.

One of the drawbacks not discussed in the prior post is a set of issues related to power transients and harmonics. With a conventional data center, there are multiple levels of power transformation and isolation between the individual server and the grid. Power usually comes in at high- or medium-voltage to a transformer and comes out as low voltage (<600v) before going to a UPS and a PDU.

In an effort to improve efficiency and reduce capital costs, facility managers are looking at removing some of these isolation layers. This is fine to a certain extent. After all, there are a lot of small businesses that run one or two servers on their own, and there aren't major problems with them. In those cases, however, there are usually relatively few computers hooked together on the same side of the electrical transformer that provides power to the building. This transformer provides isolation from building to building (or zone to zone in some installations).

When you scale up into a large data center, however, you get thousands and thousands of servers in the same building. If you remove those extra layers of isolation, the burden for providing that extra isolation falls to the power supplies in the individual servers. If servers use traditional AC power supplies, issues like phase balancing and power factor correction of all the separate power supplies becomes more of an interdepent issue.

The issues can be helped or hurt depending on what's nearby. Servers without isolation near an aluminum smelter, sawmill, subway, or steel mill may see wide fluctuations in their power quality which can result in unexplained errors.
I've seen cases with marginal power feeds where individual racks of servers seem to work fine, but the aggregate load when all servers are operating causes enough of a voltage sag that some servers occasionally don't work right. Let me tell you, those are a real pain to diagnose.

On the other hand, if you're somebody like Google or Microsoft who can locate data centers in places like The Dalles, Oregon or Quincy, Washington that are just a stone's throw from major hydroelectric dams or other sources of power, perhaps you can rely on nice clean power all the time.

External power factors may be the least of a data center manager's problems, however. The big concern with eliminating the intermediate isolation is that transients and other power line problems from one power supply can affect the operation of adjacent systems, and this can build up to significant levels if fault isolation and filtering is not supported.

Another issue that bedevils data center managers is the issue with phase balancing. In most AC-powered systems, power is delivered via three phases or legs (A, B, and C phases), each 120° out of phase with each other. At some point (usually the PDU), a neutral conductor is synthesized so that single-phase currents can run from one of these legs to neutral. In a properly balanced system, there will be equal loading on the A leg, the B leg, and the C leg. If the phases are not properly balanced, there are several bad things that can occur, including the following:
  • The neutral point will shift towards the heaviest load, lowering the voltage to the equipment on that line, resulting in premature equipment failure and undervoltage-related errors
  • An imbalanced load may cause excess current to flow over specific conductors and overheat
  • Breakers or other overcurrent mechanisms may trip

Phase imbalance can occur when network administrators do not follow a rigorous process of plugging every third server into alternate phases. Additionally, shifting workloads could cause some servers to be more heavily utilized than others--and phase balancing is almost certainly not a factor considered in allocating applications to specific servers. An even more pernicious issue can arise with systems employing redundant power supplies, such as blade servers: in an attempt to maximize efficiency, management software may shut down certain power supplies to maximize load on the remaining power supplies--all without considering what the impact to phase balancing is when the load is not equally shared among all power supplies.

Data centers that employ conventional PDUs don't generally have these issues (or have them at lesser severity), since the PDUs and their transformers are usually designed to handle significant phase imbalances without creating problems.

Additional considerations with the Google battery-backed server approach:

  • Acid risks from thousands of individual tiny batteries (i.e., cracked cases in thinner-walled batteries)
  • Shorting risks from batteries that can deliver thousands of amps of current for a short period
  • More items to monitor, or higher risks of silent failures (albeit with smaller failure domains) when you most need the batteries

This is a complex issue. I'm not convinced that Google has determined the optimal solution, but kudos to them for finally being willing to publicly discuss some of what they consider to be best practices. Collectively, we can learn bits and pieces from different sources that could end up delivering more efficient services.

--kb

Saturday, April 4, 2009

Evaluating Google's Battery-backed Server Approach

As noted previously, Google has disclosed that they put batteries on every server (see this picture of a Google rack), essentially powering their servers like the way laptops have traditionally been powered. The batteries are needed on laptops because they need to be mobile, which is not generally a consideration for servers.
Are batteries in servers a good idea?

There are some definite advantages in Google's approach:
  1. No need to pay for UPS systems (saves CapEx dollars)
  2. Eliminates two conversion stages found in a traditional AC double-conversion UPS
  3. Reduces dedicated floor space/real estate commonly devoted to UPS/battery rooms
  4. Localizes fault domains for a failed server to just one server
  5. Scales linearly with the number of servers deployed

All of these add up to a solution that works just as well for one server as it does for one thousand servers. Coupled with Google's efforts to increase energy efficiency through founding and support for the Climate Savers Computing Initiative (CSCI) and its target of 92% power supply efficiency, this solution appears to be very efficient.

However, there are some down sides to Google's approach:

  1. A lot of batteries to wire up and monitor
  2. Increased air impedance from blocking airflow
  3. Lower battery reliability with increased ambient temperatures
  4. Higher environmental impact due to increased battery materials
  5. Individual server supplies are exposed to a higher level of power transients and harmonics
  6. Potential phase imbalances and stranded power in data centers

Issue #1 is self-obvious. Issue #2 can be seen from this picture from Green Data Center Blog; the physical mass of the batteries blocks a good portion of the air space in front of the server, which increases the resistance and in turn requires more fan power to move the same amount of air.

Issues #3 and #4 are somewhat related. Google, Microsoft, and other leading internet companies have advocated moving the ambient temperatures of data centers to higher temperatures, with some advising 35°C, 40°C, or even occasionally 50°C ambient temperatures. There are clear savings to be had here, but it may run counter to the battery approach used by Google. Assuming the Google batteries are conventional lead-acid batteries, a common rule is that the useful life of batteries drops by ~50% for every 10°C above 25°C ambient temperatures. Thus, a 4-year battery would only be good for ~2 years in a 35°C environment. In comparison, conventional UPS batteries are often rated for 10, 15, or 20 years. When consolidated in a UPS battery cabinet, the batteries can be protected from the higher ambient temperatures through localized cooling (batteries dissipate almost no heat) for increased life.

Lots of little batteries like Google uses results in more materials usage compared to the use of larger batteries. Couple that with reduced battery life at higher temperatures, and the result is not as good as it first seems. According to http://www.batterycouncil.org/LeadAcidBatteries/BatteryRecycling/tabid/71/Default.aspx, more than 97% of lead from lead-acid batteries is recycled, but this also states that 60-80% the lead and plastic of new batteries is recycled material. Looking at this last stat a different way, 20-40% of lead-acid battery materials are not recycled. Thus, even if Google performs 100% battery recycling, using lots of new batteries still results in the use of a lot of new materials.

I'll address issues #5 and #6 in a future post.

--kb

Friday, April 3, 2009

Google's Server Power Supplies

This past Wednesday, Google finally provided a peek into their data centers. Green Data Center blog has a great roundoup of various articles related to this workshop, including pictures from Google's container data centers.

One of the more interesting aspects revealed Wednesday was the fact that Google has batteries attached to each of their servers.

At first, this seems rather odd. Google's explanation for this is that they use this arrangement as a 99.9% efficient replacement for UPS (Uninterruptible Power Supply) systems. Wow...99.9% efficient!

This is definitely a different approach from what most data centers do today, and it seems really far out there--until you break it down in its component parts. A simplified block diagram looks like the following:
Broken down this way, the arrangement really starts to look like a laptop. The Google server power system apparently operates just like a laptop:
  • External power supply provides ~12Vdc
  • Battery is included with every computer
  • When the external power supply fails, the battery provides power until the generator starts or power is switched to a different source

Graceful shutdown in power outages may or may not be an issue for Google's applications (likely not an issue).

Google certainly thinks they've got a winner with this approach, and goodness knows they've had experience deploying this at scale. In a future posting, I'll look at some of the pro's and con's of this approach.
--kb

Wednesday, April 1, 2009

Deciphering Intel Code Names

There's been a lot of industry buzz lately about Intel's recent release of the Nehalem-EP processor, with many references to how Nehalem is x% better than a previous platform like Bensley, Harpertown, or Clovertown.

Okay, but how can you find out what each one of these code names refers to? Well, it turns out that Intel has a web site that allows you to enter code names for released products and then look up the relevant information. Go to http://ark.intel.com/ and enter the code name (or official name) of a current Intel product, and chances are it will be listed.

One particularly useful feature of this site is the System Design capability. For example, if you enter a processor/chipset power budget and other criteria, the site will list all matching combinations. Try it out!

--kb

Tuesday, March 31, 2009

Suggestion for Energy Star Measurement of Blade Power Consumption

The US EPA is developing an Energy Star for Servers specification. Based on information in the latest draft of the specification, it looks like the EPA may be backing away from including servers in the first release ("Tier 1") of the server Energy Star specification. Given the increasing prevalance of blade servers in data centers, this would be unfortunate.

Ideally, there would be a standardized benchmark like SPECpower_ssj2008 that would be able to measure power consumption on a per-blade basis, but the current benchmark doesn't have provisions to handle chassis.

As an alternative, here are suggestions for how the EPA could measure power consumption for Energy Star (until a chassis-friendly industry specification is developed by an industry group like SPEC):
  • Apply Energy Star to blades, not to chassis. Chassis are ineligible to meet Energy Star, but the blades that go in them can be Energy Star certified.
  • Configure a chassis with the minimal amount of chassis management modules and external modules required for operation, but include all supported power supplies for a given chassis and all the fan/cooling modules typically used (don't remove redundant fans or power supplies).
  • Run a sample workload on all servers to keep them minimally active. Install the same server configuration in all server slots.

Measure total power consumption to all power feeds in the chassis under two conditions and with the following calculations:

  1. Condition 1: Determine power consumption P1 with all N server blade slots installed.
  2. Condition 2: Remove servers so that N/2 (round up) servers are evenly distributed in the chassis; call that number N'. Determine power consumption P2 at this level.
  3. P3 = P1 / N. This is the weighted average power per server blade in a full chassis.
  4. P4 = P2 / N'. This is the weighted average power per server blade in a half-full chassis.
  5. P5 = (P3 + P4) / 2. This is the weighted average power per server blade.

Notes:

  • This accounts for chassis overhead, including fans, power supplies, management modules, and network connectivity. There is a slight penalty to blades here since rack-mount servers don't include any allocation for network switch power, but represents the minimum configuration needed to use those blades. Additionally, many vendors have low-energy networking elements (i.e., passthrough blades) that minimize this impact.
  • If the chassis contains power supplies to convert input voltages to a different voltage supplied on the backplane, the power supplies used in the chassis must meet the power supply qualification requirements outlined elsewhere in the Energy Star for Servers specification.
  • If a chassis contains redundant power supplies, the server blades are eligible for an allowance of 20W per redundant power supply, divided by the number of servers. For example, if a chassis has 2+2 power supplies (2 redundant power supplies and 2 minimum power supplies for a fully loaded chassis) and 10 blades, then each server would get a 4W/server allowance (2 * 20W / 10 servers).

With all the notes above, this may look to be complicated, but it's actually a fairly simple configuration that provides a close analog to how standalone rack-mount servers are tested. This could be used in the initial version ("Tier 1") of the Energy Star for Servers specification if the EPA wanted to use it.

--kb

Thursday, March 12, 2009

Eliminating the UPS Efficiency Penalty with -48Vdc: Part II

In Eliminating the UPS Efficiency Penalty with -48Vdc, there is a discussion of how a non-redundant AC and DC configuration can have nearly equivalent efficiency in facilities without a UPS. However, when redundancy is figured in, the advantages of DC power become more pronounced.

Let's start by looking at the power supply unit (PSU) component by itself. Based on the information in the quantitative analysis by The Green Grid, high-efficiency AC and DC power supplies look like this when compared to each other:



The graph shifts to the right when redundant power supplies are considered. Since there are numerous different voltage converters in a server (modern servers often have in excess of 25 voltage rails used internally), it's really impractical to try to duplicate every voltage converter in a server--at least if you want it for a reasonable price. However, servers with redundant power supplies provide three principal benefits:

  1. Connectivity to separate primary power sources (i.e., different utility feeds)
  2. Protection against failure in upstream power equipment (i.e., failure in a PDU)
  3. Cabling problem or service failure (i.e., accidentally unplugging the wrong server)
In an AC system, separate power supplies are required to have redundant feeds, since each power feed might be slightly out of phase with the other feed by the time the power signal gets to the server (relative phasing can shift in different parts of the data center based on relative cable lengths). If a server has two power supplies equally sharing the load as is commonly done, then each power supply <50%>

In contrast, a DC system has no phasing issues to deal with. Therefore, DC-based equipment has two main options: full duplicate power supplies (like AC) or using a technique called diode OR'ing (or FET OR'ing) to safely combine power from two separate DC sources as inputs to a single power supply. [Since there are numerous downstream power converters that are not redundant, there's no need for the power supply itself to be redundant--it just needs to be fed from multiple inputs.] Many DC power supplies do this today, as this approach is commonly used in the highly-reliable telecommunications system with -48Vdc systems. The result is a wider gap between the net AC power supply efficiency and the DC power supply efficiency:

Taking this a step further, look at the typical operating point for servers vs. their power supply ratings. For example, look at the various published reports for SPECpower_ssj2008: you'll notice there are numerous cases where the power supply shipped with the system is 2-4 times the maximum power draw in the sytem. If the power supply in a system is 2x the necessary power, then the system would normally operate in the left half of the graph immediately above. If the average power is considerably less than the maximum power draw, then the system could spend the bulk of its time operating at the 25% load level or less in the graph above.

At these lower loads, the efficiency benefits of -48Vdc systems become more apparent, even when there's no UPS in the picture. If an installation uses UPSes, the efficiency gap widens further in favor of -48Vdc.

Wednesday, March 4, 2009

Eliminating the UPS Efficiency Penalty with -48Vdc

The Green Grid recently released Quantitative Efficiency Analysis Of Power Distribution Configurations For Data Centers, which shows how different power chains from 480Vac down to 12Vdc stack up in terms of efficiency. This showed -48Vdc to have the highest efficiency for systems at 60% of capacity and below--in an idealized world.

This is true when a UPS is required--but what happens if a UPS isn't needed?


Say what? Who would ever want to deploy servers without UPS
backup?

There are certain circumstances where a UPS is not needed:


  • Services with sufficient geo-redundancy that a power failure at any one site doesn't have appreciable impact on the overall service availability

  • Lower-priority services for which an infrequent service outage would be acceptable

In situations like this, how does a -48Vdc system stack up? Let's look at the data in the report from The Green Grid mentioned above:



  • The best AC power supplies to go from 240Vac down to 12Vdc peak out at around 93% efficiency [Figure 31].

  • The best DC rectifiers (with batteries) to go from 240Vdc down to -48Vdc peak out around 96.5% efficiency [Figure 29].

  • The best DC power supplies to go from -48Vdc down to 12Vdc peak out at almost 95% efficiency [Figure 31].

Taken together, the 96.5% rectifier efficiency x the 95% power supply efficiency equate to ~91.7% efficiency, slightly less than the 93% efficiency of a pure AC to 12Vdc power supply solution.


However, this is using rectifiers with tightly regulated -48Vdc outputs designed to work with batteries along with wide-ranging inputs. This is a mis-match! It's understandable why this has traditionally been done (for applications needing battery backup), but it's overkill for applications not needing battery backup.


Since most -48Vdc power supplies can handle input voltages from -42Vdc to -56Vdc (or a wider range), think what could happen with a DC rectifier with a loosely regulated output well within this range. If a DC rectifier was allowed to vary its output voltage between -44Vdc and -54Vdc, the net efficiency of the -48Vdc system could meet or beat the approach with a straight AC power supply.


Without battery backup, a -48Vdc system could match an AC system; even with full-time battery backup, the -48Vdc system is within ~1.5% of the AC system without battery backup.


Next: the story gets even better when redundancy is considered...

Sunday, March 1, 2009

Sealed Containers: Reality or Myth?

One of the interesting debates for those looking at containerized data centers is whether or not containerized data centers need to be serviceable in the field. Different products on the market today take different approaches:
  • The Sun Modular Datacenter (nee "Blackbox") provides front and rear access to each rack by mounting the racks sideways and using a special tool to slide racks into the center aisle for servicing.
  • The Rackable ICE Cube provides front access to servers, but the setup doesn't lend itself to rear access to the servers.
  • HP's Performance-Optimized Datacenter (POD) takes an alternative approach: there's a wide service aisle on the front, but you need to go outside the container to get to the back side of the racks via external doors.

Some industry notables have advocated even more drastic service changes: James Hamilton (formerly with Microsoft, now with Amazon) was one of the early proponents of containerized data centers, and he has suggested that containerized data centers could be sealed, without the need for end-users to service the hardware. The theory is that it's cheaper to leave the failed servers in the rack, up until the point that so many servers have failed that the entire container is shipped back to the vendor for replacement.

How reasonable is this?

Prior to the advent of containers, fully-configured racks (cabinets) were the largest unit of integration typically used in data centers, and these remain the highest level of integrated product used in most data centers today. How many data centers seal these integrated cabinets and never open the door to the cabinet throughout the life of the equipment in that cabinet? This is perhaps the best indicator as to whether a sealed container really matches existing practices.

We had looked at the "fail in place" model in the company where I work, but it was difficult for managers to accept that it was okay for some number of servers to be failed in a rack. As long as the cost of fixing the hardware is cheaper than the cost of buying a new server (or the equipment is under warranty), most finance people and managers want to see the servers in a rack functional.

What do you think? Do you see people keeping cabinets sealed in data centers today? Does fail in place make sense to you?