Sunday, February 8, 2009

Using T-states with Phantom Servers

One of the more obvious ways to reduce power consumption in data centers is to shut off servers that are no longer needed (often referred to as phantom servers). Unfortunately, it's often difficult to identify what servers are no longer needed. Though most well-run organizations have well thought-out procedures for determining when and how to commission servers, few organizations have comparable policies for decommissioning servers.
  • Who is responsible for older servers?
  • What services are dependent on that server?
  • What would happen if that server went away?

If your organization has a good way for determining what servers are no longer needed, congratulations--you're one of the few. For those less sure of what servers are needed, what options are available?

Unplugging Servers

Some consultants advocate simply unplugging servers whose provenance can't be determined; those whose services are dependent on those servers will soon complain, and you can then find out from the complaining party how long that server is needed.

Unfortunately, this approach induces a failure that can be rather drastic for your data center customers. In many cases, the service outage may not be acceptable; even worse, it may not be possible to properly recover from a server that is unexpectedly shut down.

Monitoring Activity

Another approach is to simply look at processor utilization and/or network utilization. This will tell you whether a server is busy--but a busy server is not necessarily a server performing a needed service. For example, a server could be disconnected from a company's search functions but still be actively crawling a company's intranet for faster internal searches. In other cases, a system could be furiously trying to connect to other services that have been decommissioned, so the system will look busy even while it is effectively unused.

Throttling Back Phantom Candidates

Processor Power Management in Linux discusses how T-states can be controlled from Linux. On most modern systems, T-states provide a way to throttle back performance in 12.5% increments by gating processor clocks between instructions.
  • Rather than shutting a server down completely, throttling the server by 50% provides a "softer" way to test whether that server is needed. This way, the server stays up and continues to provide its services--albeit at a reduced performance level.
  • If there are no complaints after a week or two at the lower performance level, drop the server performance by an additional 75% (i.e., to T7 state, 87.5% throttled).

If there are no complaints when the server is 87.5% throttled, there's a good chance the server is no longer needed. Shutting the server down at this stage can be done with higher confidence (and lower chance of reversal) than just unplugging the server without this analysis.


P.S. If there are still concerns about shutting down the server at this stage, one further approach is to migrate that server to a virtual machine. However, this last step may not be needed if the performance throttling provides high enough confidence that the throttled server is no longer needed.

No comments:

Post a Comment