Monday, May 16, 2011
There are even guidelines for lower humidity level if certain procedures are followed.
P.S. Thanks to Pasi Vaananen for the heads-up.
Sunday, March 7, 2010
Some vendors may ship servers that only draw 200W at 100% utilization with a power supply that can provide 1200W. Traditional ways of evaluating power supplies measure those power supplies across their full rated capacity. If a power supply is sized appropriately, this makes sense. However, a power supply that is too much higher than the system will see in real life should be de-rated.
For examaple, a 2000W power supply might have good efficiency at 50% and 100% of load, but power efficiency tends to drop off at lower load levels, particularly those below 25% of maximum load. Take that same 2000W power supply and put it in a server drawing a maximum of 200W, and the power supply wout always be operating below 10% load. The normal power supply rating levels are of little value if the realistic power draw is much lower than the rated power draw.
To be fair, the tested configurations of servers don't always represent the highest possible loading: adding extra memory, additional hard drives, and extra PCI Express cards can increase a servers power draw. But having no upper limit leaves too much wiggle room and jeopardizes the integrity of the Energy Star rating method.
One possible solution to work around this is as follows:
- Measure the server power consumption under an acceptable benchmark such as SPECpower_ssj2008. Record 2x the maximum power draw (i.e., at 100% load in the benchmark).
- Look at the rated output power for the power supply or power supplies needed to operate the server in that configuration [ignore redundant power supplies used for reliability purposes]. Record the sum of the power of all the non-redundant power supply output power ratings.
- If the answer in Step 2 is less than or equal to the value from Step 1, no adjustment is needed. Skip Steps 4 and 5.
- If the answer in Step 2 is more than the value in Step 1, plot the efficiency rating of the non-redundant power supplies. Extrapolate the efficiency of the power supply (power supplies) at the value recorded in Step 1. Extrapolate the efficiency at 50% and 25% of the value shown in Step 1. Do the same for any other power supply levels normally required, but rate them as a ratio of the value shown in Step 1.
- Evaluate the efficiency of the system based on the load levels and efficiency determined in Step 4 above.
This adjustment would correct ratings for power supplies oversized for the systems they're being tested with. This will incent server vendors to right-size power supplies to better match the real power range of the systems they're being rated for.
Saturday, February 20, 2010
This showcases two things in particular:
- Software can have a major impact on system efficiency. Even relatively good solutions like PHP can still be improved.
- Metrics that look only at hardware-centric criteria often ignore the benefits of more efficient software.
This second bullet merits further elaboration. Administrators looking at CPU utilization as an approximation of total server work accomplished would erroneously assume their servers were only doing half as much work with HipHop than they were beforehand, even though they would be doing the same amount of work with better software, just doing it more efficiently.
Future posts will talk about ways to measure useful work.
Tuesday, February 16, 2010
Monday, February 15, 2010
Most of the technology world's focus regarding energy efficiency has focused on hardware: better processors, better memory, better disks, better power conversion, etc. This is good, but it overlooks the substantial contribution that better software can make towards improving energy efficiency. An automobile driver who drives over the top of a hill may use more energy than someone who drives around the hill; software designed with energy efficiency in mind may use a different algorithm than a brute force approach that seems simpler at first.
The Intel® Energy Checker SDK provides developers and systems integrators a simple API that they can use to measure the amount of "useful work" performed by the system and then correlate the useful work with energy consumption. The useful work is not the number of instructions executed, cycles retired, or the average CPU utilization--that's not why you buy software. For example, you buy e-mail software to do things like send e-mails, so the measures of useful work can be the number of messages sent, the number of kilobytes in those messages sent, the number of messages received, and the number of kilobytes in those messages received. Software developers can choose what measures of useful work they export and how often they choose to export this information.
The SDK includes tools to measure the rate of power usage and to measure/calculate energy consumption over time. The SDK supports several external power meters as well as the ability to read energy consumption directly from power supplies having certain levels of instrumentation.
The software developer can easily aggregate/weight the work done in their application(s) with work done in other instrumented applications and compare that to the energy consumed by the system or systems under test to determine energy efficiency. This is an important step towards making software more energy efficient and may lead towards energy-aware algorithms in leading software packages. In turn, this will help administrators measure the aggregate useful work of their facilities, rather than simply measuring hardware-centric metrics that actually penalize more efficient software.
The SDK is available free of charge (and without royalties) from http://software.intel.com/en-us/articles/intel-energy-checker-sdk/. The SDK supports Windows, Linux, Solaris 10, and MacOS X. Source code for the core API and many utilities is included, though Intel distributes some utilities in binary form only. Check it out!
Thursday, September 3, 2009
Among the major blade vendors, Cisco appeared to have taken the lead by offering support for the broadest operating humidity range, but Sun appears to have matched Cisco recently:
- Cisco's Unified Computing System blades support 5 to 93% RH
- Sun's Sun Fire X2270 blades support "up to" 93% RH
- HP's c-Class BL260 G5 blades support 10 to 85% RH
- IBM's BladeCenter HS22 blades support 8 to 80% RH
- Dell's PowerEdge M600 blades support 8 to 80% RH
(All humidity ranges are non-condensing. All data is from vendor web sites as of 9/3/09).
Ever-widening ranges for supported humidity make the use of dry-side economizers more feasible. If vendors were able to support 0-100% relative humidity, data center operators wouldn't need to worry about humidifcation/de-humidification controls. Eliminating such controls and systems could lower capital costs, reduce operating costs, lower the carbon footprint of facilities, and lower their water footprint as well.
Thursday, July 16, 2009
Is a PUE of 1.5 in Phoenix better or worse than a PUE of 1.4 in Seattle?
It depends. In absolute numbers, the lower PUE provides an indicator of the most efficient facility. However, achieving a PUE of 1.5 in Phoenix is much more difficult than an equivalent or slightly lower number in Seattle because Phoenix is so much hotter and requires more air conditioning. Moving data centers to cooler locations helps the PUE rating, but sometimes data centers need to be located in a specific city or region. How can you compare PUE values in regions with different environmental conditions?
One possible approach is to add a geographic compensating factor:
gPUE = G * PUE
The geographic compensating factor G would be determined by The Green Grid or other trusted body based on compiled weather data. Ideally, this could be calculated empirically through a formula using data maintained by the U.S. Department of Energy (refer to this blog link for information on that data and a free tool to visually represent that data).
This approach would allow somebody to measure the technical innovation of a given facility while providing an adjustment to account for geographic disparities in temperature, wind, solar loading, etc. It's not a perfect solution (since some cooling optimizations might not work in cooler or hotter climates), but it provides some measure of equalization to facilitate more equitable comparisons between PUE claims in different locations.
Monday, June 15, 2009
Although some facility managers have developed their own methods for time-shifting HVAC energy requirements, Ice Energy may be the first vendor to market a product specifically designed to do this. The Ice Bear* distributed energy storage system provides up to 5 tons of cooling load during peak hours.
It's good to see innovative products like this coming to market.
Wednesday, May 27, 2009
Coincidentally, two days later Amazon introduced Amazon Web Services Import/Export with a blog that starts off with the following colorful quote attributed to Andy Tanenbaum:
Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.
Amazon Web Services Import/Export allows people to send USB or eSATA hard drives/media to Amazon for data sets that are impractical to send over available communications links.
It turns out that the bulk version of sneakernet may be the most expeditious way to move data. The more things change, the more things stay the same.
Note: Revised title on 5/29/09.
Monday, May 11, 2009
PUE is defined as follows:
Using the PUE metric, a facility manager can judge what ratio of power is lost in "overhead" (infrastructure) to operate the facility. A PUE of 1.6 to 2.0 is typical, but facility managers are striving to approach a PUE of 1.0, the idealized state.
Companies willing to drive more sustainable practices may incent facility managers to improve facility PUE levels. However, if this is done without context towards the overall energy or other resource consumption, it could drive inefficient behaviors.
Issue #1: Dissimilar Infrastructure Power Scaling
If a facility manager tracks PUE over a variety of workloads, they will see how the data center's infrastructure power consumption tracks with the IT load. Ideally, the infrastructure overhead (HVAC system, UPS system, etc.) will match linearly with the consumption of the servers and other gear in the data center, but this is rarely the case.
In many cases, the fixed overhead for power and cooling systems will become a higher percentage of overall power consumption as the IT load diminished. In other cases, there will be significant step functions in overall power consumption as large infrastructure items such as chillers, CRACs, or other equipment is turned on or off (as depicted in the graph to the left).
In such situations, reducing the IT power consumption could increase the PUE even if it reduces the overall energy consumption of the data center. People will often act in the direction towards which they are incented (i.e., what improves their paycheck). Managers incented to improve PUE without any clear tie-in to overall energy consumption might be reluctant to shut off unused servers or aggressively implement power saving features on their IT infrastructure if it increased their PUE--even if doing so would reduce overall facility power consumption.
Ensuring overall energy consumption is part of the incentive package (not just PUE) is critical to driving the desired behaviors.
[Part of this needs to be linked with overall productivity of the data center so that increased use of the data centers is encouraged while still incenting improved efficiency. I'll write about this in an upcoming post.]
Issue #2: Shifting Infrastructure Loads to IT
Another issue to watch is a desire to classify some infrastructure-like services as IT loads in order to improve PUE efficiencies. Examples of this include moving UPS systems into IT racks or putting large air-mover devices into equipment cabinets and trying to classify them as IT loads. This is "gaming" the system and should be actively discouraged.
The Green Grid is aware of this issue and is adding more guidelines to help people improve the accuracy and consistency of their PUE reporting.
Issue #3: Improving Infrastructure Efficiency at the Expense of IT
The third issue to watch is a move towards facility or equipment practices that reduce the infrastructure power consumption but increase the IT power consumption. In particular, the adoption of higher operating temperatures for data centers warrants particular scrutiny.
I've noted previously that there are significant gains possible by raising data center temperatures and making greater use of dry-side or wet-side economizers. However, it's important to compare the energy savings on the infrastructure side with the energy costs on the IT side. At higher temperatures, leakage currents in silicon increase and fans inside servers need to run faster to move more air through each server.
Increase the IT consumption and lower the infrastructure consumption and you get a two-fer: the PUE numerator goes down and the PUE denominator goes up, lowering the overall PUE. However, if the net power consumption doesn't go down, it usually** doesn't make sense to increase the ambient temperature. Once again, looking at overall power consumption in addition to PUE is important in incenting the proper behaviors.
**Note: For greenfield (new) data centers or substantial datacenter retrofits, raising the allowed data center temperature may eliminate or substantially reduce the CapEx (capital expenditure) cost for that data center even if the direct energy costs are slightly higher. For example, if a data center doesn't need to purchase a chiller unit, that could shave millions of dollars off the construction cost for a facility. In such cases, more complicated parameters will be needed to evaluate the benefits of raising the ambient temperature in the facility; these likely will include a net present value analysis for the CapEx savings vs. OpEx (operating expense) costs, consideration of real estate savings, etc. The real win is when both CapEx costs are avoided AND OpEx costs are lower.