Tuesday, February 17, 2009

Server Cost Adders for Higher-temp Operation

Numerous industry notables, including Microsoft's Christian Belady, have been advocating the operation of data centers with higher ambient temperatures. The cost savings by reducing or eliminating cooling plant costs could yield considerable savings. But what does it take to build servers designed to operate at these higher temperatures?

As mentioned in a previous post, telecommunications equipment is typically designed to meet the NEBS standards (55°C maximum ambient). Cost adders for NEBS equipment include the following:
  • Higher temperature integrated circuits (ICs). Commercial-grade ICs are generally rated to 70°C; higher ambient temperatures could force the use of extended temp components.
  • Heat sink costs. Higher temperatures often drive more expensive heat sink materials (i.e., copper rather than aluminum) and more use of heat sinks on components that don't need them at lower temperatures. For example, some servers need heat spreaders on DIMMs to be rated to operate at higher temperatures.
  • Corrosive gases tolerance. Telecommunications equipment generally needs to pass tests to ensure reliability in the presence of corrosive gases, including high sulfur-content air. Before dismissing this requirement, consider the case of air-side economizers: if you're bringing in outside air, do you need to worry about contaminants in the air, such as diesel exhaust from nearby trucks or from diesel generators?
  • Wider humidity range. Most NEBS equipment is designed for a wider range of allowable humidity exposure than most data center equipment. The broader use of economizers might make a wider humidity range desirable for data centers.
  • Flame tests. NEBS flame tests may be overkill for most data center equipment, in part because most data centers have sprinklers or other fire suppression controls (unlike telecom central offices, which do not have sprinklers).
  • Shake and vibe tests. NEBS equipment generally is tested to meet seismic Zone 4 earthquake tests. These tests could just as well apply to data center equipment, but it is something beyond what most data center equipment is validated against.
  • Materials selection. The use of V0-rated plastics and HF-1 or better foams in data center equipment is not necessarily a cost adder if designed in up front, but it can add appreciable expense if retrofits have to be made after-the-fact.
  • Air filters. Data center equipment generally doesn't need air filters, so these can be eliminated.
  • Long life. This actually encompasses two aspects: extended availability of certain components and long-life reliability. Telecom products often require the availability of the same components for 5-7 years, much longer than typical data center products. Similarly, telecom products often are designed to meet usable lifetimes that are much longer than most data center refresh cycles.

Which of these attributes are needed for equipment in data centers with higher temperatures? What other attributes are needed for higher temps?


Sunday, February 15, 2009

Containerized Data Centers in Buildings

Much of the focus with containerized data centers has been on mega-facilities that can house dozens of shipping containers.

Another use case where containerized data centers could make sense is in retrofitting buildings, though it may be somewhat counter-intuitive.

Building a state-of-the art facility can take a long time, but clearing out an open space and then lifting in a container could be a much faster approach to getting an optimized facility installed in a building than trying to get it built in place. Furthermore, it's possible to replace a set of equipment the same way.

This could be used with a single container, or there could be multiple containers placed together on the same floor of an office building.


Friday, February 13, 2009

Processors for Higher Temps

Higher Temperatures for Data Centers talks about emerging environmental standards that could well lead to increasing ambient temperatures in facilities. All other things being equal, higher ambient temperatures will lead to higher component temperatures.

In many cases, the maximum processor case temperature (Tcase) is the limiting factor for how high the ambient temperature can be raised. The Tcase limit is established by the semiconductor vendor as the maximum case temperature that the chip can experience and still meet the vendor's reliability goals.

This can put a crimp in plans to use outside air for cooling. In most likely data center locations, there are occasionally warm days that would increase the inlet temperature to the servers to the point that the processor Tcase would exceed the vendor's specified ratings.

The telecommunications market has had this issue for years.
NEBS-rated equipment for central offices generally has to operate at 40°C ambient temperature, but they also need to operate at 55°C for short periods (up to 96 hours at a time and up to 360 hours per year).

To address the needs of the NEBS market, Intel provides some of their processors with
dual Tcase ratings: one long-term T-case rating and a second short-term Tcase rating that is 15°C higher for up to 360 hours per year.

These processors with dual Tcase ratings may be a good fit for systems in data centers that use air-side economizers.

Wednesday, February 11, 2009

Higher Temperatures for Data Centers

As mentioned in Using Outside Air for Data Centers, there is increasing interest in using outside air to cool data centers. At times, this will mean higher ambient temperatures inside those data centers.

A couple years ago, the TEM-led SCOPE Alliance undertook the task of identifying the relevant environmental requirements for data centers and telecommunications from several different standards groups (Telcordia, ETSI, TIA, etc.) and came up with the Environmental Profile: Central Offices and Network Data Centers, eventually published last May. It was a lot of work, but I think we developed the most comprehensive overlay of those standards that has been published to date.

Last August, ASHRAE issued its 2008 ASHRAE Environmental Guidelines for Datacom Equipment, which changed the recommended setpoints in data centers from 20-25°C to 18-27°C. This may not seem like much, but it's a step in the right direction towards better allowances for dry-side (air) economizers.

Last November, Telcordia released GR-3160, NEBS(TM) Requirements for Telecommunications Data CenterEquipment and Spaces. [I had provided information for this to Telcordia through the two years this document was in process.] Though those outside the telecom world may not be familiar with Telcordia (or its predecessor, Bellcore), the Telcordia NEBS standards have become the de facto standards for how telecommunications facilties and equipment is designed in the U.S. and many places around the world. GR-3160 is effectively Telcordia's first major foray into data center standards, a clear recognition that telecommunications carriers like AT&T, Verizon, and Qwest have a large number of data centers and a growing need to ensure maximum availability in those facilities. Among the interesting elements of GR-3160 is Telcordia's expectation for allowable maximum ambient temperatures at the air inlet to the rack-mounted equipment:
  • 30°C long-term max ambient
  • 40°C short-term max ambient (up to 360 hours per year)

These two upper limits provide a reasonable target that equipment manufacturers can design to, while providing enough freedom that facility designers can make maximum use of economizers.


Sunday, February 8, 2009

Using T-states with Phantom Servers

One of the more obvious ways to reduce power consumption in data centers is to shut off servers that are no longer needed (often referred to as phantom servers). Unfortunately, it's often difficult to identify what servers are no longer needed. Though most well-run organizations have well thought-out procedures for determining when and how to commission servers, few organizations have comparable policies for decommissioning servers.
  • Who is responsible for older servers?
  • What services are dependent on that server?
  • What would happen if that server went away?

If your organization has a good way for determining what servers are no longer needed, congratulations--you're one of the few. For those less sure of what servers are needed, what options are available?

Unplugging Servers

Some consultants advocate simply unplugging servers whose provenance can't be determined; those whose services are dependent on those servers will soon complain, and you can then find out from the complaining party how long that server is needed.

Unfortunately, this approach induces a failure that can be rather drastic for your data center customers. In many cases, the service outage may not be acceptable; even worse, it may not be possible to properly recover from a server that is unexpectedly shut down.

Monitoring Activity

Another approach is to simply look at processor utilization and/or network utilization. This will tell you whether a server is busy--but a busy server is not necessarily a server performing a needed service. For example, a server could be disconnected from a company's search functions but still be actively crawling a company's intranet for faster internal searches. In other cases, a system could be furiously trying to connect to other services that have been decommissioned, so the system will look busy even while it is effectively unused.

Throttling Back Phantom Candidates

Processor Power Management in Linux discusses how T-states can be controlled from Linux. On most modern systems, T-states provide a way to throttle back performance in 12.5% increments by gating processor clocks between instructions.
  • Rather than shutting a server down completely, throttling the server by 50% provides a "softer" way to test whether that server is needed. This way, the server stays up and continues to provide its services--albeit at a reduced performance level.
  • If there are no complaints after a week or two at the lower performance level, drop the server performance by an additional 75% (i.e., to T7 state, 87.5% throttled).

If there are no complaints when the server is 87.5% throttled, there's a good chance the server is no longer needed. Shutting the server down at this stage can be done with higher confidence (and lower chance of reversal) than just unplugging the server without this analysis.


P.S. If there are still concerns about shutting down the server at this stage, one further approach is to migrate that server to a virtual machine. However, this last step may not be needed if the performance throttling provides high enough confidence that the throttled server is no longer needed.

Friday, February 6, 2009

Re-calibrating Efficiencies vs. Utilization

The Green Grid released a Quantitative Efficiency Analysis Of Power Distribution Configurations For Data Centers document late last year, outlining the best case conversion efficiencies of various power distribution options from 480Vac, 3-phase input down to a 12Vdc intermediate voltage rail. As a member of the power task force that created this document, I can attest to the many hours spent analyzing various data sources and finding a way to present the information as accurately as possible. What we ended up with was a document showing best-case efficiencies of all components in this conversion chain, using the best data we could publish from either vendors or third-party sources.

As good as this document is, there are some obvious next steps to be done to further refine the analysis and make it more relevant to real-world conditions. Probably the biggest area that needs to be addressed is better information on load levels. When we look at a graph of efficiency vs. percentage load, it's all too easy to look at the upper part of this graph and discard the lower loads as meaningless. Unfortunately, it's in these lower load levels that most data centers tend to live. Let's look at an example:
  • Suppose a data center (or a portion of a data center) is anticipated to ultimately need 800kW of actual peak capacity of critical load backed up by a UPS (uninterruptible power supply).
  • That same data center will be filled up incrementally over the next three years, with the electrical load doubling each year while it is being filled.
  • For high reliability, the UPS system deployed in 1+1 configuration, with the outputs routed to a common bus bar that feeds redundant PDUs (Power Distribution Units); the PDUs provide redundant feeds to chassis power supplies, which provide redundant power to the blades in the chassis.
  • To meet electrical code and company policies, 20% headroom is required in the power system to handle any unanticipated loads or planning/projection oversights.

The 20% headroom on an 800kW critical load drives the requirement for a 1MW UPS. [If the next largest size UPS is 1.2MW, the UPS would start out with an extra 200kW of unusable capacity; for the purposes of this comparison, however, let's assume that there's a perfect match available with a 1MW UPS.] The 1+1 redundancy requirement means that two 1MW UPS units will be required; under normal circumstances, these will share the load between them. Thus, an 800kW load will at most drive a 40% loading of the UPS systems.

But wait, there's more! That 40% load is when all the servers, network gear, and other critical loads are simultaneously operating at their maximum power draw. This rarely, if ever, happens (though special events can cause it to happen, so you have to be able to handle those events without losing power). Under normal circumstances, the daily peak load could be 70% of that value or less. If traffic in that data center follows significant daily cycles (i.e., high demand during the work day and minimal traffic at night), power consumption in the data center could be less than 50% of the potential peak.

Utilizing 50% of that 40% maximum UPS load puts overall load in the 20% of rated capacity range. Given the projected population of the data center over time, the data center manager could find themselves with power infrastructure operating 5, 10, or 20% of rated capacity on average. All of a sudden those esoteric numbers at the left edge of the efficiency vs. load graphs start to become much more important.

My advice to facility planners is to ignore the right-hand 2/3 of the efficiency graphs. Even in a well-designed operation with redundant systems, you're likely to spend the bulk of your time at load levels less than 30% of the rated infrastructure capacity. Focus on the numbers on the lower end of the load curve.


Tuesday, February 3, 2009

Using Outside Air for Data Centers

Conventional wisdom holds that it takes one Watt of cooling to remove every Watt of ICT equipment inside the data center. Today's data centers can do a bit better than that, but cooling costs remain a considerable OpEx cost for facilities.

One approach that's generating increased interest is the use of dry-side economizers, which bring in outside air to cool the data center. Using outside air saves the power that is normally used by compressors and chiller plants to cool facilities; even bigger gains may be achieved by avoiding CapEx (Capital Expense) costs by eliminating the purchase of chiller plants entirely or at least reducing CapEx costs by installing smaller cooling plants.

At first blush, this approach may seem to only be of marginal value. However, higher density data centers (such as those with blade servers) may have a relatively large temperature increase between inlet and exhaust temperature. Even if the desired inlet temperature is only 75°F, a facility with a 50°F temperature rise would have an exhaust temperature of 125°F--most ambient temperatures are well below this temperature. Bringing in outside air could take less energy than cooling the recycled air--humidity considerations notwithstanding.

To look at the impact of using outside air to cool data center equipment, several data center operators have performed small-scale tests to see how data center equipment is impacted by outside air:

Air economizers look promising, based on these results.


Monday, February 2, 2009

Processor Power Management in Windows

In Processor Power Management in Linux, I discussed Linux support for C-state, P-state, and T-state usage in Linux.

In Windows, there is also a comprehensive mechanism for power management. To most end-users, this configurability for power management is limited to what is shown in the Power applet in the Control Panel. However, there is additional configurability in Windows Vista, Windows Server 2008, and a bit of Windows XP that Microsoft has documented at
http://download.microsoft.com/download/0/0/b/00bba048-35e6-4e5b-a3dc-36da83cbb0d1/ProcPowerMgmt.docx. This is designed primarily for OEMs or systems integrators to use to define power policies tailored to a specific platform, but savvy tech users may
find ways to use the information to their advantage.


Sunday, February 1, 2009

Processor Power Management in Linux

At a high level, processor power management with modern processors involves managing three different types of processor states:
  • Processor power states (C-states)
  • Processor performance states (P-states)
  • Throttling states (T-states)

C0 is higher performance than C1, P0 is higher performance than P1, and T0 is higher performance than T1. Different processors offer different granularities for each of these capabilities. Some of the newest processors offer per-core C-state support, per-socket P-state support, and per-thread T-states. Remarkably, current kernels of Linux have built-in support for each of these capabilities.

The Linux idle process automatically makes in-depth usage of the various C-states. For example, Intel’s “Nehalem” processors support C0, C1, C3, and C6 states, and the idle process uses these states as appropriate.
If you want to set the maximum C-state in Linux, just put processor.max_cstate= on the kernel command line (in grub, just hit the "e" or "a" keys)...... Bear in mind that is the ACPI C state, not the processor one, so ACPI C3 might be HW C6, etc. When in doubt, the powertop utility will show you which is which. Powertop is available from

P-states essentially refer to different frequencies supported by a given processor. As a general rule, higher frequency processors offer more P-states than lower frequency processors. In Linux, the cpufreq module allows control of the P-states:

  • cd /sys/devices/system/cpu
  • ls -L
  • cd cpux/cpufreq
  • cat scaling_available_frequencies
  • echo -n xxxxx > scaling_max_freq
  • echo -n yyyyy > scaling_min_freq


  • x is the appropriate CPU number from the prior command (though it may only be the first one that actually matters)
  • xxxxx and yyyyy are the desired frequencies from the list of scaling_available_frequencies defined above; set this these to be the same to peg the processor to a single frequency/P-state

Automatic P-state Control

Linux has different performance governors available to set P-state policies. Among the most interesting is the ondemand governor, which provides automatic adjustment of P-states. With the on-demand governor, there are additional tunable parameters that can adjust the performance of the governor--see http://software.intel.com/en-us/articles/enhanced-intel-speedstepr-technology-and-demand-based-switching-on-linux for details.

T-states (throttling states) essentially stop clocks to the processor between instructions to approximate the desired duty cycles. They were originally developed to adjust processor performance in response to thermal conditions, but this can also have an impact on power as well. For processors supporting T-states, there are usually 8 T-states (T0 through T7), corresponding to 12.5% reductions in duty cycle.

Only manual T-state control is available today in Linux:

  • cd /proc/acpi/processor/
  • ls -L CPU*
  • cd CPUx
  • echo -n y > throttling
  • cat throttling


  • x is the appropriate CPU number from the prior command (though it may only be the first one that actually matters)--note the upper case
  • y is a value from 0 to 7, correspond to T0 [not throttled] toT7 [87.5% throttled]

Here are a few scripts to set the T7 for all processors in the system, check the status of T-states, and then switch all the processors back to T0:

  • for ii in `ls /proc/acpi/processor/CPU*/throttling`; do echo -n 7 > $ii; done
  • for ii in `ls /proc/acpi/processor/CPU*/throttling`; do echo $ii; cat $ii; done
  • for ii in `ls /proc/acpi/processor/CPU*/throttling`; do echo -n 0 > $ii; done

For more information, see http://acpi.sourceforge.net/documentation/processor.html.

Statistics in Linux
PowerTop utility (http://www.lesswatts.org/projects/powertop/) provides information on P-state and C-state usage in a given system. Additional information available from Linux:

  • C-state transition info:
    cat /proc/acpi/processor/CPU*/power
  • P-state transition info
    cat /sys/devices/system/cpu/cpu*/cpufreq/stats/total_tran
    cat /sys/devices/system/cpu/cpu*/cpufreq/stats/time_in_state