Monday, May 11, 2009

How a Good Metric Could Drive Bad Behaviors

The PUE (Power Usage Effectiveness) metric from The Green Grid has become a widely referenced benchmark in the data center community, and justifiably so. However, there can be a dark side to following this metric blindly.


PUE is defined as follows:

PUE = Total Facility Power/IT Equipment Power

Using the PUE metric, a facility manager can judge what ratio of power is lost in "overhead" (infrastructure) to operate the facility. A PUE of 1.6 to 2.0 is typical, but facility managers are striving to approach a PUE of 1.0, the idealized state.

Companies willing to drive more sustainable practices may incent facility managers to improve facility PUE levels. However, if this is done without context towards the overall energy or other resource consumption, it could drive inefficient behaviors.

Issue #1: Dissimilar Infrastructure Power Scaling

If a facility manager tracks PUE over a variety of workloads, they will see how the data center's infrastructure power consumption tracks with the IT load. Ideally, the infrastructure overhead (HVAC system, UPS system, etc.) will match linearly with the consumption of the servers and other gear in the data center, but this is rarely the case.

In many cases, the fixed overhead for power and cooling systems will become a higher percentage of overall power consumption as the IT load diminished. In other cases, there will be significant step functions in overall power consumption as large infrastructure items such as chillers, CRACs, or other equipment is turned on or off (as depicted in the graph to the left).

In such situations, reducing the IT power consumption could increase the PUE even if it reduces the overall energy consumption of the data center. People will often act in the direction towards which they are incented (i.e., what improves their paycheck). Managers incented to improve PUE without any clear tie-in to overall energy consumption might be reluctant to shut off unused servers or aggressively implement power saving features on their IT infrastructure if it increased their PUE--even if doing so would reduce overall facility power consumption.

Ensuring overall energy consumption is part of the incentive package (not just PUE) is critical to driving the desired behaviors.

[Part of this needs to be linked with overall productivity of the data center so that increased use of the data centers is encouraged while still incenting improved efficiency. I'll write about this in an upcoming post.]

Issue #2: Shifting Infrastructure Loads to IT

Another issue to watch is a desire to classify some infrastructure-like services as IT loads in order to improve PUE efficiencies. Examples of this include moving UPS systems into IT racks or putting large air-mover devices into equipment cabinets and trying to classify them as IT loads. This is "gaming" the system and should be actively discouraged.

The Green Grid is aware of this issue and is adding more guidelines to help people improve the accuracy and consistency of their PUE reporting.

Issue #3: Improving Infrastructure Efficiency at the Expense of IT

The third issue to watch is a move towards facility or equipment practices that reduce the infrastructure power consumption but increase the IT power consumption. In particular, the adoption of higher operating temperatures for data centers warrants particular scrutiny.

I've noted previously that there are significant gains possible by raising data center temperatures and making greater use of dry-side or wet-side economizers. However, it's important to compare the energy savings on the infrastructure side with the energy costs on the IT side. At higher temperatures, leakage currents in silicon increase and fans inside servers need to run faster to move more air through each server.

Increase the IT consumption and lower the infrastructure consumption and you get a two-fer: the PUE numerator goes down and the PUE denominator goes up, lowering the overall PUE. However, if the net power consumption doesn't go down, it usually** doesn't make sense to increase the ambient temperature. Once again, looking at overall power consumption in addition to PUE is important in incenting the proper behaviors.


**Note: For greenfield (new) data centers or substantial datacenter retrofits, raising the allowed data center temperature may eliminate or substantially reduce the CapEx (capital expenditure) cost for that data center even if the direct energy costs are slightly higher. For example, if a data center doesn't need to purchase a chiller unit, that could shave millions of dollars off the construction cost for a facility. In such cases, more complicated parameters will be needed to evaluate the benefits of raising the ambient temperature in the facility; these likely will include a net present value analysis for the CapEx savings vs. OpEx (operating expense) costs, consideration of real estate savings, etc. The real win is when both CapEx costs are avoided AND OpEx costs are lower.

1 comment:

  1. Very interesting topic! I haven't try to measure how good metric could drive bad. I've learned a lot here. Thanks for sharing.