Thursday, September 3, 2009

Cisco & Sun Servers Spar for Best Humidity Support

One of the clear trends in data centers is to improve data center efficiency by making greater use of HVAC economizer modes. For air-side (dry-side or outside air) economizers, one of the keys to broad adoption is to be able to use outside air across as broad a relative humidity (RH) range as possible.

Among the major blade vendors, Cisco appeared to have taken the lead by offering support for the broadest operating humidity range, but Sun appears to have matched Cisco recently:

(All humidity ranges are non-condensing. All data is from vendor web sites as of 9/3/09).

Ever-widening ranges for supported humidity make the use of dry-side economizers more feasible. If vendors were able to support 0-100% relative humidity, data center operators wouldn't need to worry about humidifcation/de-humidification controls. Eliminating such controls and systems could lower capital costs, reduce operating costs, lower the carbon footprint of facilities, and lower their water footprint as well.

--kb

Thursday, July 16, 2009

Adding a Geographic Element to PUE Calculations

The PUE metric has become one of the most significant metrics for measuring the gross efficiency of a data center. As data center operators boast of PUE numbers that approach the optimal rating of 1.0, it's often difficult to separate out environmental or regional factors.

Is a PUE of 1.5 in Phoenix better or worse than a PUE of 1.4 in Seattle?

It depends. In absolute numbers, the lower PUE provides an indicator of the most efficient facility. However, achieving a PUE of 1.5 in Phoenix is much more difficult than an equivalent or slightly lower number in Seattle because Phoenix is so much hotter and requires more air conditioning. Moving data centers to cooler locations helps the PUE rating, but sometimes data centers need to be located in a specific city or region. How can you compare PUE values in regions with different environmental conditions?

One possible approach is to add a geographic compensating factor:

gPUE = G * PUE

The geographic compensating factor G would be determined by The Green Grid or other trusted body based on compiled weather data. Ideally, this could be calculated empirically through a formula using data maintained by the U.S. Department of Energy (refer to this blog link for information on that data and a free tool to visually represent that data).

This approach would allow somebody to measure the technical innovation of a given facility while providing an adjustment to account for geographic disparities in temperature, wind, solar loading, etc. It's not a perfect solution (since some cooling optimizations might not work in cooler or hotter climates), but it provides some measure of equalization to facilitate more equitable comparisons between PUE claims in different locations.

--kb

Monday, June 15, 2009

Making Ice to Lower PUE and TCO

At night, demand on the grid is lower, energy costs tend to be lower, and temperatures are also lower. These three factors make night an attractive time to produce thermal storage. This allows facility managers to time-shift HVAC-related energy costs to reduce peak demands on the grid and lower energy costs.

Although some facility managers have developed their own methods for time-shifting HVAC energy requirements, Ice Energy may be the first vendor to market a product specifically designed to do this. The Ice Bear* distributed energy storage system provides up to 5 tons of cooling load during peak hours.

It's good to see innovative products like this coming to market.

--kb

Wednesday, May 27, 2009

Truckin' Down the Information Superhighway

Last week, I was talking with a friend from Sun who is involved with Sun's containerized data centers. He mentioned that since they helped the Internet Archive put 3.2 petabytes of storage in a shipping container, they figured they could put the container on a truck, take 7 days to ship the container across country, and still average >40 Gbps over that 7 day period!

Coincidentally, two days later Amazon introduced Amazon Web Services Import/Export with a blog that starts off with the following colorful quote attributed to Andy Tanenbaum:
Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.

Amazon Web Services Import/Export allows people to send USB or eSATA hard drives/media to Amazon for data sets that are impractical to send over available communications links.

It turns out that the bulk version of sneakernet may be the most expeditious way to move data. The more things change, the more things stay the same.

--kb
Note: Revised title on 5/29/09.

Monday, May 11, 2009

How a Good Metric Could Drive Bad Behaviors

The PUE (Power Usage Effectiveness) metric from The Green Grid has become a widely referenced benchmark in the data center community, and justifiably so. However, there can be a dark side to following this metric blindly.


Introduction

PUE is defined as follows:

PUE = Total Facility Power/IT Equipment Power

Using the PUE metric, a facility manager can judge what ratio of power is lost in "overhead" (infrastructure) to operate the facility. A PUE of 1.6 to 2.0 is typical, but facility managers are striving to approach a PUE of 1.0, the idealized state.

Companies willing to drive more sustainable practices may incent facility managers to improve facility PUE levels. However, if this is done without context towards the overall energy or other resource consumption, it could drive inefficient behaviors.



Issue #1: Dissimilar Infrastructure Power Scaling

If a facility manager tracks PUE over a variety of workloads, they will see how the data center's infrastructure power consumption tracks with the IT load. Ideally, the infrastructure overhead (HVAC system, UPS system, etc.) will match linearly with the consumption of the servers and other gear in the data center, but this is rarely the case.



In many cases, the fixed overhead for power and cooling systems will become a higher percentage of overall power consumption as the IT load diminished. In other cases, there will be significant step functions in overall power consumption as large infrastructure items such as chillers, CRACs, or other equipment is turned on or off (as depicted in the graph to the left).

In such situations, reducing the IT power consumption could increase the PUE even if it reduces the overall energy consumption of the data center. People will often act in the direction towards which they are incented (i.e., what improves their paycheck). Managers incented to improve PUE without any clear tie-in to overall energy consumption might be reluctant to shut off unused servers or aggressively implement power saving features on their IT infrastructure if it increased their PUE--even if doing so would reduce overall facility power consumption.

Ensuring overall energy consumption is part of the incentive package (not just PUE) is critical to driving the desired behaviors.

[Part of this needs to be linked with overall productivity of the data center so that increased use of the data centers is encouraged while still incenting improved efficiency. I'll write about this in an upcoming post.]



Issue #2: Shifting Infrastructure Loads to IT

Another issue to watch is a desire to classify some infrastructure-like services as IT loads in order to improve PUE efficiencies. Examples of this include moving UPS systems into IT racks or putting large air-mover devices into equipment cabinets and trying to classify them as IT loads. This is "gaming" the system and should be actively discouraged.

The Green Grid is aware of this issue and is adding more guidelines to help people improve the accuracy and consistency of their PUE reporting.



Issue #3: Improving Infrastructure Efficiency at the Expense of IT

The third issue to watch is a move towards facility or equipment practices that reduce the infrastructure power consumption but increase the IT power consumption. In particular, the adoption of higher operating temperatures for data centers warrants particular scrutiny.

I've noted previously that there are significant gains possible by raising data center temperatures and making greater use of dry-side or wet-side economizers. However, it's important to compare the energy savings on the infrastructure side with the energy costs on the IT side. At higher temperatures, leakage currents in silicon increase and fans inside servers need to run faster to move more air through each server.

Increase the IT consumption and lower the infrastructure consumption and you get a two-fer: the PUE numerator goes down and the PUE denominator goes up, lowering the overall PUE. However, if the net power consumption doesn't go down, it usually** doesn't make sense to increase the ambient temperature. Once again, looking at overall power consumption in addition to PUE is important in incenting the proper behaviors.

--kb


**Note: For greenfield (new) data centers or substantial datacenter retrofits, raising the allowed data center temperature may eliminate or substantially reduce the CapEx (capital expenditure) cost for that data center even if the direct energy costs are slightly higher. For example, if a data center doesn't need to purchase a chiller unit, that could shave millions of dollars off the construction cost for a facility. In such cases, more complicated parameters will be needed to evaluate the benefits of raising the ambient temperature in the facility; these likely will include a net present value analysis for the CapEx savings vs. OpEx (operating expense) costs, consideration of real estate savings, etc. The real win is when both CapEx costs are avoided AND OpEx costs are lower.

Wednesday, May 6, 2009

SSD Potential Power Savings Writ Large

E-week has an interesting article about an iSuppli report suggesting that if all data centers moved 10% of their hard disks to SSDs (solid-state disks), it would save more power worldwide than the total 2006 power consumption of the African country of Gambia. This equates to more than 166,000 Megawatt hours.

However, since the analysis assumed an SSD averaged 7W, SSDs that use less than 2W could save more than 3x that amount. Adding in HVAC and power infrastructure savings, the savings could be even higher.

--kb

Saturday, May 2, 2009

Building Codes and Roof Anchors

One of the biggest impediments to installing rooftop solar panels or wind energy devices such as helical windmills is the cost and risk associated with properly anchoring them to a building's roof. Extra effort is required to ensure retrofit roof penetrations do not leak; in many cases, installation of the roof anchors takes the majority of the installation time. The noise from cutting through the roof can significantly disrupt those on the floor or two below.

If "rooftop renewables" are designed into a building during initial construction, the cost is substantially lower. However, it may not be feasible to install these rooftop renewables when the building is first built (due to limited capital or other reasons).

A middle ground is to provide rooftop anchors during initial construction, regardless of whether or not rooftop renewables are installed with initial construction. That way, solar or wind devices could be installed at a later date much more easily and with no need to breach the roof seal.

All new data centers should be designed for the later installation of rooftop renewables, even if they aren't part of the initial build-out.

Taking this a step further, I advocate the following: building codes should be revised to **REQUIRE** all new commercial buildings with a roof area greater than 1000 square feet to install roof anchors every x (20?) feet, with a TBD load rating for each anchor. (These anchors should also be required when major roof renovations are initiated as well.) Requiring these anchors will facilitate the broader adoption of rooftop renewables in data centers and other commercial buildings.

I hope others will adopt this cause; together we can effect real changes,
--kb

Wednesday, April 29, 2009

Human Side of Higher Data Center Temperatures

With all the talk of hotter data center temperatures, one item that has often been overlooked is what happens to the poor soul tasked with going in and servicing equipment in that data center. Imagine having to work in a facility at 40°C (104°F) for several hours at a time--and that's at the equipment input. The exhaust temperature on the back side of the rack could easily be 55°C (131°F).

One approach is to adopt a "fail in place" model where technicians never go into a production facility, but even Google has technicians adding and replacing individual servers in their containerized data centers.

Other approaches to consider:
  • Localized spot cooling. A very small air conditioner could take the edge off the area in front of a rack.
  • Perform service operations at night or when it's reasonably cool.

This last suggestion may seem too simplistic at first, but it's actually quite practical. In a facility with sufficient redundancy to ensure high availability, server replacement should be able to wait up to 24 hours. Operating a data center at consistently high temperatures will end up increasing power consumption in the IT equipment. It only makes sense to use higher temperatures in a data center when using optimizers to eliminate or substantially reduce HVAC CapEx and OpEx costs.

If a data center is using economizers, the temperature in the data center should drop when the outside temperature drops. Even in relatively warm areas during summer months, there are substantial times each day where the temperature drops to reasonable levels in which technicians can comfortably work.

--kb

Monday, April 13, 2009

NEBS vs. the Hottest Place on Earth

As mentioned in Higher Temperatures for Data Center and Processors for Higher Temps, various groups are pushing for higher and higher ambient temperatures in data centers. At Google's Efficient Data Center Summit last week, Amazon's James Hamilton brought up an interesting point in his slides and blog about ambient temperatures:
the hottest place on earth over recorded history was Al Aziziyah Libya in 1922 where 136F (58C) was indicated

James went on to note during his talk that telecommunications equipment designed to the NEBS (Network Equipment Building System) standards routinely has to handle temperatures up to 40°C.

Actually, the story is better than that. NEBS-GR-63 (the key NEBS specification dealing with environmental conditions for equipment in telecommunications central offices) requires equipment to handle 40°C long-term ambient temperatures, but telecommunications equipment certified at the shelf (chassis) level needs to be able to operate at 55°C ambient for up to 96 hours at a time and up to 360 hours per year [the 360 hours is for reliability calculations]. This means that much of the NEBS-rated equipment for data centers can operate at temperatures that are only 3°C lower than the highest natural temperature ever recorded on Earth, as noted by James.

Given the common engineering penchant to provide some guardband on products vs. the official specifications, even a 58°C ambient is not out of the question. This means that NEBS-rated equipment could be good candidates for data centers operating at high temperatures.

But can you get decent performance in NEBS-rated servers? Yes! For example, vendors such as Radisys, Kontron, and Emerson have announced blade servers with Intel's new 5500 (aka "Nehalem") processors, and their bladed servers commonly are NEBS certified to operate at 55°C. This would allow the latest server technology to operate in the most demanding environments.

--kb

Thursday, April 9, 2009

More on Google's Battery-backed Servers

As noted in Evaluating Google's Battery-backed Server Approach, there are a number of benefits to Google's recently-disclosed practice of putting VRLA batteries on every server, but there are quite a few drawbacks as well.

One of the drawbacks not discussed in the prior post is a set of issues related to power transients and harmonics. With a conventional data center, there are multiple levels of power transformation and isolation between the individual server and the grid. Power usually comes in at high- or medium-voltage to a transformer and comes out as low voltage (<600v) before going to a UPS and a PDU.

In an effort to improve efficiency and reduce capital costs, facility managers are looking at removing some of these isolation layers. This is fine to a certain extent. After all, there are a lot of small businesses that run one or two servers on their own, and there aren't major problems with them. In those cases, however, there are usually relatively few computers hooked together on the same side of the electrical transformer that provides power to the building. This transformer provides isolation from building to building (or zone to zone in some installations).

When you scale up into a large data center, however, you get thousands and thousands of servers in the same building. If you remove those extra layers of isolation, the burden for providing that extra isolation falls to the power supplies in the individual servers. If servers use traditional AC power supplies, issues like phase balancing and power factor correction of all the separate power supplies becomes more of an interdepent issue.

The issues can be helped or hurt depending on what's nearby. Servers without isolation near an aluminum smelter, sawmill, subway, or steel mill may see wide fluctuations in their power quality which can result in unexplained errors.
I've seen cases with marginal power feeds where individual racks of servers seem to work fine, but the aggregate load when all servers are operating causes enough of a voltage sag that some servers occasionally don't work right. Let me tell you, those are a real pain to diagnose.

On the other hand, if you're somebody like Google or Microsoft who can locate data centers in places like The Dalles, Oregon or Quincy, Washington that are just a stone's throw from major hydroelectric dams or other sources of power, perhaps you can rely on nice clean power all the time.

External power factors may be the least of a data center manager's problems, however. The big concern with eliminating the intermediate isolation is that transients and other power line problems from one power supply can affect the operation of adjacent systems, and this can build up to significant levels if fault isolation and filtering is not supported.

Another issue that bedevils data center managers is the issue with phase balancing. In most AC-powered systems, power is delivered via three phases or legs (A, B, and C phases), each 120° out of phase with each other. At some point (usually the PDU), a neutral conductor is synthesized so that single-phase currents can run from one of these legs to neutral. In a properly balanced system, there will be equal loading on the A leg, the B leg, and the C leg. If the phases are not properly balanced, there are several bad things that can occur, including the following:
  • The neutral point will shift towards the heaviest load, lowering the voltage to the equipment on that line, resulting in premature equipment failure and undervoltage-related errors
  • An imbalanced load may cause excess current to flow over specific conductors and overheat
  • Breakers or other overcurrent mechanisms may trip

Phase imbalance can occur when network administrators do not follow a rigorous process of plugging every third server into alternate phases. Additionally, shifting workloads could cause some servers to be more heavily utilized than others--and phase balancing is almost certainly not a factor considered in allocating applications to specific servers. An even more pernicious issue can arise with systems employing redundant power supplies, such as blade servers: in an attempt to maximize efficiency, management software may shut down certain power supplies to maximize load on the remaining power supplies--all without considering what the impact to phase balancing is when the load is not equally shared among all power supplies.

Data centers that employ conventional PDUs don't generally have these issues (or have them at lesser severity), since the PDUs and their transformers are usually designed to handle significant phase imbalances without creating problems.

Additional considerations with the Google battery-backed server approach:

  • Acid risks from thousands of individual tiny batteries (i.e., cracked cases in thinner-walled batteries)
  • Shorting risks from batteries that can deliver thousands of amps of current for a short period
  • More items to monitor, or higher risks of silent failures (albeit with smaller failure domains) when you most need the batteries

This is a complex issue. I'm not convinced that Google has determined the optimal solution, but kudos to them for finally being willing to publicly discuss some of what they consider to be best practices. Collectively, we can learn bits and pieces from different sources that could end up delivering more efficient services.

--kb

Saturday, April 4, 2009

Evaluating Google's Battery-backed Server Approach

As noted previously, Google has disclosed that they put batteries on every server (see this picture of a Google rack), essentially powering their servers like the way laptops have traditionally been powered. The batteries are needed on laptops because they need to be mobile, which is not generally a consideration for servers.
Are batteries in servers a good idea?

There are some definite advantages in Google's approach:
  1. No need to pay for UPS systems (saves CapEx dollars)
  2. Eliminates two conversion stages found in a traditional AC double-conversion UPS
  3. Reduces dedicated floor space/real estate commonly devoted to UPS/battery rooms
  4. Localizes fault domains for a failed server to just one server
  5. Scales linearly with the number of servers deployed

All of these add up to a solution that works just as well for one server as it does for one thousand servers. Coupled with Google's efforts to increase energy efficiency through founding and support for the Climate Savers Computing Initiative (CSCI) and its target of 92% power supply efficiency, this solution appears to be very efficient.

However, there are some down sides to Google's approach:

  1. A lot of batteries to wire up and monitor
  2. Increased air impedance from blocking airflow
  3. Lower battery reliability with increased ambient temperatures
  4. Higher environmental impact due to increased battery materials
  5. Individual server supplies are exposed to a higher level of power transients and harmonics
  6. Potential phase imbalances and stranded power in data centers

Issue #1 is self-obvious. Issue #2 can be seen from this picture from Green Data Center Blog; the physical mass of the batteries blocks a good portion of the air space in front of the server, which increases the resistance and in turn requires more fan power to move the same amount of air.

Issues #3 and #4 are somewhat related. Google, Microsoft, and other leading internet companies have advocated moving the ambient temperatures of data centers to higher temperatures, with some advising 35°C, 40°C, or even occasionally 50°C ambient temperatures. There are clear savings to be had here, but it may run counter to the battery approach used by Google. Assuming the Google batteries are conventional lead-acid batteries, a common rule is that the useful life of batteries drops by ~50% for every 10°C above 25°C ambient temperatures. Thus, a 4-year battery would only be good for ~2 years in a 35°C environment. In comparison, conventional UPS batteries are often rated for 10, 15, or 20 years. When consolidated in a UPS battery cabinet, the batteries can be protected from the higher ambient temperatures through localized cooling (batteries dissipate almost no heat) for increased life.

Lots of little batteries like Google uses results in more materials usage compared to the use of larger batteries. Couple that with reduced battery life at higher temperatures, and the result is not as good as it first seems. According to http://www.batterycouncil.org/LeadAcidBatteries/BatteryRecycling/tabid/71/Default.aspx, more than 97% of lead from lead-acid batteries is recycled, but this also states that 60-80% the lead and plastic of new batteries is recycled material. Looking at this last stat a different way, 20-40% of lead-acid battery materials are not recycled. Thus, even if Google performs 100% battery recycling, using lots of new batteries still results in the use of a lot of new materials.

I'll address issues #5 and #6 in a future post.

--kb

Friday, April 3, 2009

Google's Server Power Supplies

This past Wednesday, Google finally provided a peek into their data centers. Green Data Center blog has a great roundoup of various articles related to this workshop, including pictures from Google's container data centers.

One of the more interesting aspects revealed Wednesday was the fact that Google has batteries attached to each of their servers.

At first, this seems rather odd. Google's explanation for this is that they use this arrangement as a 99.9% efficient replacement for UPS (Uninterruptible Power Supply) systems. Wow...99.9% efficient!

This is definitely a different approach from what most data centers do today, and it seems really far out there--until you break it down in its component parts. A simplified block diagram looks like the following:
Broken down this way, the arrangement really starts to look like a laptop. The Google server power system apparently operates just like a laptop:
  • External power supply provides ~12Vdc
  • Battery is included with every computer
  • When the external power supply fails, the battery provides power until the generator starts or power is switched to a different source

Graceful shutdown in power outages may or may not be an issue for Google's applications (likely not an issue).

Google certainly thinks they've got a winner with this approach, and goodness knows they've had experience deploying this at scale. In a future posting, I'll look at some of the pro's and con's of this approach.
--kb

Wednesday, April 1, 2009

Deciphering Intel Code Names

There's been a lot of industry buzz lately about Intel's recent release of the Nehalem-EP processor, with many references to how Nehalem is x% better than a previous platform like Bensley, Harpertown, or Clovertown.

Okay, but how can you find out what each one of these code names refers to? Well, it turns out that Intel has a web site that allows you to enter code names for released products and then look up the relevant information. Go to http://ark.intel.com/ and enter the code name (or official name) of a current Intel product, and chances are it will be listed.

One particularly useful feature of this site is the System Design capability. For example, if you enter a processor/chipset power budget and other criteria, the site will list all matching combinations. Try it out!

--kb

Tuesday, March 31, 2009

Suggestion for Energy Star Measurement of Blade Power Consumption

The US EPA is developing an Energy Star for Servers specification. Based on information in the latest draft of the specification, it looks like the EPA may be backing away from including servers in the first release ("Tier 1") of the server Energy Star specification. Given the increasing prevalance of blade servers in data centers, this would be unfortunate.

Ideally, there would be a standardized benchmark like SPECpower_ssj2008 that would be able to measure power consumption on a per-blade basis, but the current benchmark doesn't have provisions to handle chassis.

As an alternative, here are suggestions for how the EPA could measure power consumption for Energy Star (until a chassis-friendly industry specification is developed by an industry group like SPEC):
  • Apply Energy Star to blades, not to chassis. Chassis are ineligible to meet Energy Star, but the blades that go in them can be Energy Star certified.
  • Configure a chassis with the minimal amount of chassis management modules and external modules required for operation, but include all supported power supplies for a given chassis and all the fan/cooling modules typically used (don't remove redundant fans or power supplies).
  • Run a sample workload on all servers to keep them minimally active. Install the same server configuration in all server slots.

Measure total power consumption to all power feeds in the chassis under two conditions and with the following calculations:

  1. Condition 1: Determine power consumption P1 with all N server blade slots installed.
  2. Condition 2: Remove servers so that N/2 (round up) servers are evenly distributed in the chassis; call that number N'. Determine power consumption P2 at this level.
  3. P3 = P1 / N. This is the weighted average power per server blade in a full chassis.
  4. P4 = P2 / N'. This is the weighted average power per server blade in a half-full chassis.
  5. P5 = (P3 + P4) / 2. This is the weighted average power per server blade.

Notes:

  • This accounts for chassis overhead, including fans, power supplies, management modules, and network connectivity. There is a slight penalty to blades here since rack-mount servers don't include any allocation for network switch power, but represents the minimum configuration needed to use those blades. Additionally, many vendors have low-energy networking elements (i.e., passthrough blades) that minimize this impact.
  • If the chassis contains power supplies to convert input voltages to a different voltage supplied on the backplane, the power supplies used in the chassis must meet the power supply qualification requirements outlined elsewhere in the Energy Star for Servers specification.
  • If a chassis contains redundant power supplies, the server blades are eligible for an allowance of 20W per redundant power supply, divided by the number of servers. For example, if a chassis has 2+2 power supplies (2 redundant power supplies and 2 minimum power supplies for a fully loaded chassis) and 10 blades, then each server would get a 4W/server allowance (2 * 20W / 10 servers).

With all the notes above, this may look to be complicated, but it's actually a fairly simple configuration that provides a close analog to how standalone rack-mount servers are tested. This could be used in the initial version ("Tier 1") of the Energy Star for Servers specification if the EPA wanted to use it.

--kb

Thursday, March 12, 2009

Eliminating the UPS Efficiency Penalty with -48Vdc: Part II

In Eliminating the UPS Efficiency Penalty with -48Vdc, there is a discussion of how a non-redundant AC and DC configuration can have nearly equivalent efficiency in facilities without a UPS. However, when redundancy is figured in, the advantages of DC power become more pronounced.

Let's start by looking at the power supply unit (PSU) component by itself. Based on the information in the quantitative analysis by The Green Grid, high-efficiency AC and DC power supplies look like this when compared to each other:



The graph shifts to the right when redundant power supplies are considered. Since there are numerous different voltage converters in a server (modern servers often have in excess of 25 voltage rails used internally), it's really impractical to try to duplicate every voltage converter in a server--at least if you want it for a reasonable price. However, servers with redundant power supplies provide three principal benefits:

  1. Connectivity to separate primary power sources (i.e., different utility feeds)
  2. Protection against failure in upstream power equipment (i.e., failure in a PDU)
  3. Cabling problem or service failure (i.e., accidentally unplugging the wrong server)
In an AC system, separate power supplies are required to have redundant feeds, since each power feed might be slightly out of phase with the other feed by the time the power signal gets to the server (relative phasing can shift in different parts of the data center based on relative cable lengths). If a server has two power supplies equally sharing the load as is commonly done, then each power supply <50%>

In contrast, a DC system has no phasing issues to deal with. Therefore, DC-based equipment has two main options: full duplicate power supplies (like AC) or using a technique called diode OR'ing (or FET OR'ing) to safely combine power from two separate DC sources as inputs to a single power supply. [Since there are numerous downstream power converters that are not redundant, there's no need for the power supply itself to be redundant--it just needs to be fed from multiple inputs.] Many DC power supplies do this today, as this approach is commonly used in the highly-reliable telecommunications system with -48Vdc systems. The result is a wider gap between the net AC power supply efficiency and the DC power supply efficiency:

Taking this a step further, look at the typical operating point for servers vs. their power supply ratings. For example, look at the various published reports for SPECpower_ssj2008: you'll notice there are numerous cases where the power supply shipped with the system is 2-4 times the maximum power draw in the sytem. If the power supply in a system is 2x the necessary power, then the system would normally operate in the left half of the graph immediately above. If the average power is considerably less than the maximum power draw, then the system could spend the bulk of its time operating at the 25% load level or less in the graph above.

At these lower loads, the efficiency benefits of -48Vdc systems become more apparent, even when there's no UPS in the picture. If an installation uses UPSes, the efficiency gap widens further in favor of -48Vdc.

Wednesday, March 4, 2009

Eliminating the UPS Efficiency Penalty with -48Vdc

The Green Grid recently released Quantitative Efficiency Analysis Of Power Distribution Configurations For Data Centers, which shows how different power chains from 480Vac down to 12Vdc stack up in terms of efficiency. This showed -48Vdc to have the highest efficiency for systems at 60% of capacity and below--in an idealized world.

This is true when a UPS is required--but what happens if a UPS isn't needed?


Say what? Who would ever want to deploy servers without UPS
backup?

There are certain circumstances where a UPS is not needed:


  • Services with sufficient geo-redundancy that a power failure at any one site doesn't have appreciable impact on the overall service availability

  • Lower-priority services for which an infrequent service outage would be acceptable

In situations like this, how does a -48Vdc system stack up? Let's look at the data in the report from The Green Grid mentioned above:



  • The best AC power supplies to go from 240Vac down to 12Vdc peak out at around 93% efficiency [Figure 31].

  • The best DC rectifiers (with batteries) to go from 240Vdc down to -48Vdc peak out around 96.5% efficiency [Figure 29].

  • The best DC power supplies to go from -48Vdc down to 12Vdc peak out at almost 95% efficiency [Figure 31].

Taken together, the 96.5% rectifier efficiency x the 95% power supply efficiency equate to ~91.7% efficiency, slightly less than the 93% efficiency of a pure AC to 12Vdc power supply solution.


However, this is using rectifiers with tightly regulated -48Vdc outputs designed to work with batteries along with wide-ranging inputs. This is a mis-match! It's understandable why this has traditionally been done (for applications needing battery backup), but it's overkill for applications not needing battery backup.


Since most -48Vdc power supplies can handle input voltages from -42Vdc to -56Vdc (or a wider range), think what could happen with a DC rectifier with a loosely regulated output well within this range. If a DC rectifier was allowed to vary its output voltage between -44Vdc and -54Vdc, the net efficiency of the -48Vdc system could meet or beat the approach with a straight AC power supply.


Without battery backup, a -48Vdc system could match an AC system; even with full-time battery backup, the -48Vdc system is within ~1.5% of the AC system without battery backup.


Next: the story gets even better when redundancy is considered...

Sunday, March 1, 2009

Sealed Containers: Reality or Myth?

One of the interesting debates for those looking at containerized data centers is whether or not containerized data centers need to be serviceable in the field. Different products on the market today take different approaches:
  • The Sun Modular Datacenter (nee "Blackbox") provides front and rear access to each rack by mounting the racks sideways and using a special tool to slide racks into the center aisle for servicing.
  • The Rackable ICE Cube provides front access to servers, but the setup doesn't lend itself to rear access to the servers.
  • HP's Performance-Optimized Datacenter (POD) takes an alternative approach: there's a wide service aisle on the front, but you need to go outside the container to get to the back side of the racks via external doors.

Some industry notables have advocated even more drastic service changes: James Hamilton (formerly with Microsoft, now with Amazon) was one of the early proponents of containerized data centers, and he has suggested that containerized data centers could be sealed, without the need for end-users to service the hardware. The theory is that it's cheaper to leave the failed servers in the rack, up until the point that so many servers have failed that the entire container is shipped back to the vendor for replacement.

How reasonable is this?

Prior to the advent of containers, fully-configured racks (cabinets) were the largest unit of integration typically used in data centers, and these remain the highest level of integrated product used in most data centers today. How many data centers seal these integrated cabinets and never open the door to the cabinet throughout the life of the equipment in that cabinet? This is perhaps the best indicator as to whether a sealed container really matches existing practices.

We had looked at the "fail in place" model in the company where I work, but it was difficult for managers to accept that it was okay for some number of servers to be failed in a rack. As long as the cost of fixing the hardware is cheaper than the cost of buying a new server (or the equipment is under warranty), most finance people and managers want to see the servers in a rack functional.

What do you think? Do you see people keeping cabinets sealed in data centers today? Does fail in place make sense to you?

Tuesday, February 17, 2009

Server Cost Adders for Higher-temp Operation

Numerous industry notables, including Microsoft's Christian Belady, have been advocating the operation of data centers with higher ambient temperatures. The cost savings by reducing or eliminating cooling plant costs could yield considerable savings. But what does it take to build servers designed to operate at these higher temperatures?

As mentioned in a previous post, telecommunications equipment is typically designed to meet the NEBS standards (55°C maximum ambient). Cost adders for NEBS equipment include the following:
  • Higher temperature integrated circuits (ICs). Commercial-grade ICs are generally rated to 70°C; higher ambient temperatures could force the use of extended temp components.
  • Heat sink costs. Higher temperatures often drive more expensive heat sink materials (i.e., copper rather than aluminum) and more use of heat sinks on components that don't need them at lower temperatures. For example, some servers need heat spreaders on DIMMs to be rated to operate at higher temperatures.
  • Corrosive gases tolerance. Telecommunications equipment generally needs to pass tests to ensure reliability in the presence of corrosive gases, including high sulfur-content air. Before dismissing this requirement, consider the case of air-side economizers: if you're bringing in outside air, do you need to worry about contaminants in the air, such as diesel exhaust from nearby trucks or from diesel generators?
  • Wider humidity range. Most NEBS equipment is designed for a wider range of allowable humidity exposure than most data center equipment. The broader use of economizers might make a wider humidity range desirable for data centers.
  • Flame tests. NEBS flame tests may be overkill for most data center equipment, in part because most data centers have sprinklers or other fire suppression controls (unlike telecom central offices, which do not have sprinklers).
  • Shake and vibe tests. NEBS equipment generally is tested to meet seismic Zone 4 earthquake tests. These tests could just as well apply to data center equipment, but it is something beyond what most data center equipment is validated against.
  • Materials selection. The use of V0-rated plastics and HF-1 or better foams in data center equipment is not necessarily a cost adder if designed in up front, but it can add appreciable expense if retrofits have to be made after-the-fact.
  • Air filters. Data center equipment generally doesn't need air filters, so these can be eliminated.
  • Long life. This actually encompasses two aspects: extended availability of certain components and long-life reliability. Telecom products often require the availability of the same components for 5-7 years, much longer than typical data center products. Similarly, telecom products often are designed to meet usable lifetimes that are much longer than most data center refresh cycles.

Which of these attributes are needed for equipment in data centers with higher temperatures? What other attributes are needed for higher temps?

--kb

Sunday, February 15, 2009

Containerized Data Centers in Buildings

Much of the focus with containerized data centers has been on mega-facilities that can house dozens of shipping containers.

Another use case where containerized data centers could make sense is in retrofitting buildings, though it may be somewhat counter-intuitive.

Building a state-of-the art facility can take a long time, but clearing out an open space and then lifting in a container could be a much faster approach to getting an optimized facility installed in a building than trying to get it built in place. Furthermore, it's possible to replace a set of equipment the same way.

This could be used with a single container, or there could be multiple containers placed together on the same floor of an office building.

--kb

Friday, February 13, 2009

Processors for Higher Temps

Higher Temperatures for Data Centers talks about emerging environmental standards that could well lead to increasing ambient temperatures in facilities. All other things being equal, higher ambient temperatures will lead to higher component temperatures.

In many cases, the maximum processor case temperature (Tcase) is the limiting factor for how high the ambient temperature can be raised. The Tcase limit is established by the semiconductor vendor as the maximum case temperature that the chip can experience and still meet the vendor's reliability goals.

This can put a crimp in plans to use outside air for cooling. In most likely data center locations, there are occasionally warm days that would increase the inlet temperature to the servers to the point that the processor Tcase would exceed the vendor's specified ratings.

The telecommunications market has had this issue for years.
NEBS-rated equipment for central offices generally has to operate at 40°C ambient temperature, but they also need to operate at 55°C for short periods (up to 96 hours at a time and up to 360 hours per year).

To address the needs of the NEBS market, Intel provides some of their processors with
dual Tcase ratings: one long-term T-case rating and a second short-term Tcase rating that is 15°C higher for up to 360 hours per year.

These processors with dual Tcase ratings may be a good fit for systems in data centers that use air-side economizers.

Wednesday, February 11, 2009

Higher Temperatures for Data Centers

As mentioned in Using Outside Air for Data Centers, there is increasing interest in using outside air to cool data centers. At times, this will mean higher ambient temperatures inside those data centers.

A couple years ago, the TEM-led SCOPE Alliance undertook the task of identifying the relevant environmental requirements for data centers and telecommunications from several different standards groups (Telcordia, ETSI, TIA, etc.) and came up with the Environmental Profile: Central Offices and Network Data Centers, eventually published last May. It was a lot of work, but I think we developed the most comprehensive overlay of those standards that has been published to date.

Last August, ASHRAE issued its 2008 ASHRAE Environmental Guidelines for Datacom Equipment, which changed the recommended setpoints in data centers from 20-25°C to 18-27°C. This may not seem like much, but it's a step in the right direction towards better allowances for dry-side (air) economizers.

Last November, Telcordia released GR-3160, NEBS(TM) Requirements for Telecommunications Data CenterEquipment and Spaces. [I had provided information for this to Telcordia through the two years this document was in process.] Though those outside the telecom world may not be familiar with Telcordia (or its predecessor, Bellcore), the Telcordia NEBS standards have become the de facto standards for how telecommunications facilties and equipment is designed in the U.S. and many places around the world. GR-3160 is effectively Telcordia's first major foray into data center standards, a clear recognition that telecommunications carriers like AT&T, Verizon, and Qwest have a large number of data centers and a growing need to ensure maximum availability in those facilities. Among the interesting elements of GR-3160 is Telcordia's expectation for allowable maximum ambient temperatures at the air inlet to the rack-mounted equipment:
  • 30°C long-term max ambient
  • 40°C short-term max ambient (up to 360 hours per year)

These two upper limits provide a reasonable target that equipment manufacturers can design to, while providing enough freedom that facility designers can make maximum use of economizers.

--kb

Sunday, February 8, 2009

Using T-states with Phantom Servers

One of the more obvious ways to reduce power consumption in data centers is to shut off servers that are no longer needed (often referred to as phantom servers). Unfortunately, it's often difficult to identify what servers are no longer needed. Though most well-run organizations have well thought-out procedures for determining when and how to commission servers, few organizations have comparable policies for decommissioning servers.
  • Who is responsible for older servers?
  • What services are dependent on that server?
  • What would happen if that server went away?

If your organization has a good way for determining what servers are no longer needed, congratulations--you're one of the few. For those less sure of what servers are needed, what options are available?

Unplugging Servers

Some consultants advocate simply unplugging servers whose provenance can't be determined; those whose services are dependent on those servers will soon complain, and you can then find out from the complaining party how long that server is needed.

Unfortunately, this approach induces a failure that can be rather drastic for your data center customers. In many cases, the service outage may not be acceptable; even worse, it may not be possible to properly recover from a server that is unexpectedly shut down.

Monitoring Activity

Another approach is to simply look at processor utilization and/or network utilization. This will tell you whether a server is busy--but a busy server is not necessarily a server performing a needed service. For example, a server could be disconnected from a company's search functions but still be actively crawling a company's intranet for faster internal searches. In other cases, a system could be furiously trying to connect to other services that have been decommissioned, so the system will look busy even while it is effectively unused.

Throttling Back Phantom Candidates

Processor Power Management in Linux discusses how T-states can be controlled from Linux. On most modern systems, T-states provide a way to throttle back performance in 12.5% increments by gating processor clocks between instructions.
  • Rather than shutting a server down completely, throttling the server by 50% provides a "softer" way to test whether that server is needed. This way, the server stays up and continues to provide its services--albeit at a reduced performance level.
  • If there are no complaints after a week or two at the lower performance level, drop the server performance by an additional 75% (i.e., to T7 state, 87.5% throttled).

If there are no complaints when the server is 87.5% throttled, there's a good chance the server is no longer needed. Shutting the server down at this stage can be done with higher confidence (and lower chance of reversal) than just unplugging the server without this analysis.

--kb

P.S. If there are still concerns about shutting down the server at this stage, one further approach is to migrate that server to a virtual machine. However, this last step may not be needed if the performance throttling provides high enough confidence that the throttled server is no longer needed.

Friday, February 6, 2009

Re-calibrating Efficiencies vs. Utilization

The Green Grid released a Quantitative Efficiency Analysis Of Power Distribution Configurations For Data Centers document late last year, outlining the best case conversion efficiencies of various power distribution options from 480Vac, 3-phase input down to a 12Vdc intermediate voltage rail. As a member of the power task force that created this document, I can attest to the many hours spent analyzing various data sources and finding a way to present the information as accurately as possible. What we ended up with was a document showing best-case efficiencies of all components in this conversion chain, using the best data we could publish from either vendors or third-party sources.

As good as this document is, there are some obvious next steps to be done to further refine the analysis and make it more relevant to real-world conditions. Probably the biggest area that needs to be addressed is better information on load levels. When we look at a graph of efficiency vs. percentage load, it's all too easy to look at the upper part of this graph and discard the lower loads as meaningless. Unfortunately, it's in these lower load levels that most data centers tend to live. Let's look at an example:
  • Suppose a data center (or a portion of a data center) is anticipated to ultimately need 800kW of actual peak capacity of critical load backed up by a UPS (uninterruptible power supply).
  • That same data center will be filled up incrementally over the next three years, with the electrical load doubling each year while it is being filled.
  • For high reliability, the UPS system deployed in 1+1 configuration, with the outputs routed to a common bus bar that feeds redundant PDUs (Power Distribution Units); the PDUs provide redundant feeds to chassis power supplies, which provide redundant power to the blades in the chassis.
  • To meet electrical code and company policies, 20% headroom is required in the power system to handle any unanticipated loads or planning/projection oversights.

The 20% headroom on an 800kW critical load drives the requirement for a 1MW UPS. [If the next largest size UPS is 1.2MW, the UPS would start out with an extra 200kW of unusable capacity; for the purposes of this comparison, however, let's assume that there's a perfect match available with a 1MW UPS.] The 1+1 redundancy requirement means that two 1MW UPS units will be required; under normal circumstances, these will share the load between them. Thus, an 800kW load will at most drive a 40% loading of the UPS systems.

But wait, there's more! That 40% load is when all the servers, network gear, and other critical loads are simultaneously operating at their maximum power draw. This rarely, if ever, happens (though special events can cause it to happen, so you have to be able to handle those events without losing power). Under normal circumstances, the daily peak load could be 70% of that value or less. If traffic in that data center follows significant daily cycles (i.e., high demand during the work day and minimal traffic at night), power consumption in the data center could be less than 50% of the potential peak.

Utilizing 50% of that 40% maximum UPS load puts overall load in the 20% of rated capacity range. Given the projected population of the data center over time, the data center manager could find themselves with power infrastructure operating 5, 10, or 20% of rated capacity on average. All of a sudden those esoteric numbers at the left edge of the efficiency vs. load graphs start to become much more important.

My advice to facility planners is to ignore the right-hand 2/3 of the efficiency graphs. Even in a well-designed operation with redundant systems, you're likely to spend the bulk of your time at load levels less than 30% of the rated infrastructure capacity. Focus on the numbers on the lower end of the load curve.

--kb

Tuesday, February 3, 2009

Using Outside Air for Data Centers

Conventional wisdom holds that it takes one Watt of cooling to remove every Watt of ICT equipment inside the data center. Today's data centers can do a bit better than that, but cooling costs remain a considerable OpEx cost for facilities.

One approach that's generating increased interest is the use of dry-side economizers, which bring in outside air to cool the data center. Using outside air saves the power that is normally used by compressors and chiller plants to cool facilities; even bigger gains may be achieved by avoiding CapEx (Capital Expense) costs by eliminating the purchase of chiller plants entirely or at least reducing CapEx costs by installing smaller cooling plants.

At first blush, this approach may seem to only be of marginal value. However, higher density data centers (such as those with blade servers) may have a relatively large temperature increase between inlet and exhaust temperature. Even if the desired inlet temperature is only 75°F, a facility with a 50°F temperature rise would have an exhaust temperature of 125°F--most ambient temperatures are well below this temperature. Bringing in outside air could take less energy than cooling the recycled air--humidity considerations notwithstanding.

To look at the impact of using outside air to cool data center equipment, several data center operators have performed small-scale tests to see how data center equipment is impacted by outside air:

Air economizers look promising, based on these results.

--kb

Monday, February 2, 2009

Processor Power Management in Windows

In Processor Power Management in Linux, I discussed Linux support for C-state, P-state, and T-state usage in Linux.

In Windows, there is also a comprehensive mechanism for power management. To most end-users, this configurability for power management is limited to what is shown in the Power applet in the Control Panel. However, there is additional configurability in Windows Vista, Windows Server 2008, and a bit of Windows XP that Microsoft has documented at
http://download.microsoft.com/download/0/0/b/00bba048-35e6-4e5b-a3dc-36da83cbb0d1/ProcPowerMgmt.docx. This is designed primarily for OEMs or systems integrators to use to define power policies tailored to a specific platform, but savvy tech users may
find ways to use the information to their advantage.

--kb

Sunday, February 1, 2009

Processor Power Management in Linux

At a high level, processor power management with modern processors involves managing three different types of processor states:
  • Processor power states (C-states)
  • Processor performance states (P-states)
  • Throttling states (T-states)

C0 is higher performance than C1, P0 is higher performance than P1, and T0 is higher performance than T1. Different processors offer different granularities for each of these capabilities. Some of the newest processors offer per-core C-state support, per-socket P-state support, and per-thread T-states. Remarkably, current kernels of Linux have built-in support for each of these capabilities.

C-states:
The Linux idle process automatically makes in-depth usage of the various C-states. For example, Intel’s “Nehalem” processors support C0, C1, C3, and C6 states, and the idle process uses these states as appropriate.
If you want to set the maximum C-state in Linux, just put processor.max_cstate= on the kernel command line (in grub, just hit the "e" or "a" keys)...... Bear in mind that is the ACPI C state, not the processor one, so ACPI C3 might be HW C6, etc. When in doubt, the powertop utility will show you which is which. Powertop is available from
http://www.lesswatts.org/projects/powertop/.

P-states:
P-states essentially refer to different frequencies supported by a given processor. As a general rule, higher frequency processors offer more P-states than lower frequency processors. In Linux, the cpufreq module allows control of the P-states:

  • cd /sys/devices/system/cpu
  • ls -L
  • cd cpux/cpufreq
  • cat scaling_available_frequencies
  • echo -n xxxxx > scaling_max_freq
  • echo -n yyyyy > scaling_min_freq

Where:

  • x is the appropriate CPU number from the prior command (though it may only be the first one that actually matters)
  • xxxxx and yyyyy are the desired frequencies from the list of scaling_available_frequencies defined above; set this these to be the same to peg the processor to a single frequency/P-state

Automatic P-state Control

Linux has different performance governors available to set P-state policies. Among the most interesting is the ondemand governor, which provides automatic adjustment of P-states. With the on-demand governor, there are additional tunable parameters that can adjust the performance of the governor--see http://software.intel.com/en-us/articles/enhanced-intel-speedstepr-technology-and-demand-based-switching-on-linux for details.

T-states
T-states (throttling states) essentially stop clocks to the processor between instructions to approximate the desired duty cycles. They were originally developed to adjust processor performance in response to thermal conditions, but this can also have an impact on power as well. For processors supporting T-states, there are usually 8 T-states (T0 through T7), corresponding to 12.5% reductions in duty cycle.

Only manual T-state control is available today in Linux:

  • cd /proc/acpi/processor/
  • ls -L CPU*
  • cd CPUx
  • echo -n y > throttling
  • cat throttling

Where:

  • x is the appropriate CPU number from the prior command (though it may only be the first one that actually matters)--note the upper case
  • y is a value from 0 to 7, correspond to T0 [not throttled] toT7 [87.5% throttled]

Here are a few scripts to set the T7 for all processors in the system, check the status of T-states, and then switch all the processors back to T0:

  • for ii in `ls /proc/acpi/processor/CPU*/throttling`; do echo -n 7 > $ii; done
  • for ii in `ls /proc/acpi/processor/CPU*/throttling`; do echo $ii; cat $ii; done
  • for ii in `ls /proc/acpi/processor/CPU*/throttling`; do echo -n 0 > $ii; done

For more information, see http://acpi.sourceforge.net/documentation/processor.html.

Statistics in Linux
The
PowerTop utility (http://www.lesswatts.org/projects/powertop/) provides information on P-state and C-state usage in a given system. Additional information available from Linux:

  • C-state transition info:
    cat /proc/acpi/processor/CPU*/power
  • P-state transition info
    cat /sys/devices/system/cpu/cpu*/cpufreq/stats/total_tran
    cat /sys/devices/system/cpu/cpu*/cpufreq/stats/time_in_state

Enjoy!
--kb

Saturday, January 31, 2009

Temperature and Climate Data Around the World

One of the issues that affects the efficiency of data centers is the climate in which the data center is located. A well-run data center in Phoenix could draw more power than a poorly run data center in Toronto. When looking at measures of data center efficiency, it's important to consider these factors.

There's a nifty utility available from UCLA called Climate Consultant (see
http://www.aud.ucla.edu/energy-design-tools/tools/Climate4.exe or
http://www2.aud.ucla.edu/energy-design-tools/). This software allows you to draw
psychrometric charts, plot average/min/max temperatures, etc. for any location for which you have the proper type of weather file. For example, the graphic at the right shows a psychrometric chart for Hillsboro, Oregon.



Not only is this utility free, you can get weather data for many locations for free. The U.S. Department of Energy (DoE) maintains weather data for over a thousand locations around the world, including most airports at http://apps1.eere.energy.gov/buildings/energyplus/cfm/weather_data.cfm. For example, http://apps1.eere.energy.gov/buildings/energyplus/cfm/weather_data3.cfm/region=4_north_and_central_america_wmo_region_4/country=1_usa/cname=USA#OR provides data for several sites in Oregon. Just save the .EPW file to your local hard disk, and you're good to go.


Enjoy,
--kb

P.S. Thanks to Pasi Vaananen for the pointer to this tool.

Welcome to My Blog!

This is the inaugural posting to my blog. I hope this will be an opportunity to share with my friends, colleagues, and acquaintances some ideas I've had running through my mind relative to technology, energy efficiency, and some ideas of how to do things better. Please let me know what you think about this and if you find the information useful--or if you need more explanation on a topic.

For those I've never met, I hope this blog provides the vehicle for a creative exchange of ideas. You'll find that I'm passionate about some ideas, and I pledge to do my best to help you see why I believe so strongly in some of these potential solutions.

A problem without a solution is just an issue that hasn't been thought about long enough. I have confidence that collectively we can come up with better ways of doing things, and we may find some breakthrough approaches along the way.

Buckle up, it'll be a fun ride!

Kevin