Sunday, March 7, 2010

Energy Star for Server Should Require Right-sized Power Supplies

The EPA is currently developing Tier 2 (second-phase) requirements for servers with the Energy Star ratings. Although the EPA is probably right to back away from the non-standard definition they were developing to capture "net power loss" for servers, they should still look at adjusting server power supply efficiencies when power supplies are oversized.

Some vendors may ship servers that only draw 200W at 100% utilization with a power supply that can provide 1200W. Traditional ways of evaluating power supplies measure those power supplies across their full rated capacity. If a power supply is sized appropriately, this makes sense. However, a power supply that is too much higher than the system will see in real life should be de-rated.

For examaple, a 2000W power supply might have good efficiency at 50% and 100% of load, but power efficiency tends to drop off at lower load levels, particularly those below 25% of maximum load. Take that same 2000W power supply and put it in a server drawing a maximum of 200W, and the power supply wout always be operating below 10% load. The normal power supply rating levels are of little value if the realistic power draw is much lower than the rated power draw.

To be fair, the tested configurations of servers don't always represent the highest possible loading: adding extra memory, additional hard drives, and extra PCI Express cards can increase a servers power draw. But having no upper limit leaves too much wiggle room and jeopardizes the integrity of the Energy Star rating method.

One possible solution to work around this is as follows:
  1. Measure the server power consumption under an acceptable benchmark such as SPECpower_ssj2008. Record 2x the maximum power draw (i.e., at 100% load in the benchmark).
  2. Look at the rated output power for the power supply or power supplies needed to operate the server in that configuration [ignore redundant power supplies used for reliability purposes]. Record the sum of the power of all the non-redundant power supply output power ratings.
  3. If the answer in Step 2 is less than or equal to the value from Step 1, no adjustment is needed. Skip Steps 4 and 5.
  4. If the answer in Step 2 is more than the value in Step 1, plot the efficiency rating of the non-redundant power supplies. Extrapolate the efficiency of the power supply (power supplies) at the value recorded in Step 1. Extrapolate the efficiency at 50% and 25% of the value shown in Step 1. Do the same for any other power supply levels normally required, but rate them as a ratio of the value shown in Step 1.
  5. Evaluate the efficiency of the system based on the load levels and efficiency determined in Step 4 above.

This adjustment would correct ratings for power supplies oversized for the systems they're being tested with. This will incent server vendors to right-size power supplies to better match the real power range of the systems they're being rated for.

--kb

Saturday, February 20, 2010

FaceBook's HipHop Software Efficiency

Haiping Zhao has a great blog entry showcasing how Facebook has been able to improve the efficiency of their applications through the development of HipHop for PHP. Simply stated, HipHop transforms PHP code into optimized C++ code. According to Zhao, this technology reduced the average CPU utilization on their servers by 50%.

This showcases two things in particular:
  • Software can have a major impact on system efficiency. Even relatively good solutions like PHP can still be improved.
  • Metrics that look only at hardware-centric criteria often ignore the benefits of more efficient software.

This second bullet merits further elaboration. Administrators looking at CPU utilization as an approximation of total server work accomplished would erroneously assume their servers were only doing half as much work with HipHop than they were beforehand, even though they would be doing the same amount of work with better software, just doing it more efficiently.

Future posts will talk about ways to measure useful work.

--kb

Tuesday, February 16, 2010

Good IBM doc on cpufreq

Previous posts have talked about power management in Linux, including information about the cpufreq module. IBM has published a good document explaining many aspects of how to use the cpufreq module. This document is worth a look.

Monday, February 15, 2010

Intel® Energy Checker SDK Released

For nearly the last two years, I've been working with a colleague named Jamel Tayeb to develop a tool that could be used to help measure the energy efficiency of software (and data centers as an aggregate measure). I'm happy to say that the Intel® Energy Checker SDK has now finally gone public and is available for download from http://whatif.intel.com.

Most of the technology world's focus regarding energy efficiency has focused on hardware: better processors, better memory, better disks, better power conversion, etc. This is good, but it overlooks the substantial contribution that better software can make towards improving energy efficiency. An automobile driver who drives over the top of a hill may use more energy than someone who drives around the hill; software designed with energy efficiency in mind may use a different algorithm than a brute force approach that seems simpler at first.

The Intel® Energy Checker SDK provides developers and systems integrators a simple API that they can use to measure the amount of "useful work" performed by the system and then correlate the useful work with energy consumption. The useful work is not the number of instructions executed, cycles retired, or the average CPU utilization--that's not why you buy software. For example, you buy e-mail software to do things like send e-mails, so the measures of useful work can be the number of messages sent, the number of kilobytes in those messages sent, the number of messages received, and the number of kilobytes in those messages received. Software developers can choose what measures of useful work they export and how often they choose to export this information.

The SDK includes tools to measure the rate of power usage and to measure/calculate energy consumption over time. The SDK supports several external power meters as well as the ability to read energy consumption directly from power supplies having certain levels of instrumentation.

The software developer can easily aggregate/weight the work done in their application(s) with work done in other instrumented applications and compare that to the energy consumed by the system or systems under test to determine energy efficiency. This is an important step towards making software more energy efficient and may lead towards energy-aware algorithms in leading software packages. In turn, this will help administrators measure the aggregate useful work of their facilities, rather than simply measuring hardware-centric metrics that actually penalize more efficient software.

The SDK is available free of charge (and without royalties) from http://software.intel.com/en-us/articles/intel-energy-checker-sdk/. The SDK supports Windows, Linux, Solaris 10, and MacOS X. Source code for the core API and many utilities is included, though Intel distributes some utilities in binary form only. Check it out!

--kb