Mitigating the Limits of Your Data Center

To help data center managers understand how best to employ new energy metrics and tools, we examine five ways to optimize use of available space, cooling capacity, and power.

By Clemens Pfeiffer, Founder and Chief Technology Officer, Power Assure

What is the limiting factor in your data center? Are you running out of space? Is the cooling system reaching its capacity? Have you exhausted the available power? A data center manager’s worst nightmare may well be outgrowing a data center because the cost to build a new, state-of-the-art facility is currently about $1000 per square foot.

The worst-case scenario is that there is simply no longer sufficient space and power to accommodate your equipment even when you’ve updated to a state-of-the-art facility and IT components. With today’s rapid improvements in high-density and virtualized servers and storage systems, however, such a situation is rare. A far more likely scenario is that the data center is full of old equipment, utilizing space and power that could be consolidated, running an inefficient cooling system that takes away power that could be used for additional IT capacity.

For this reason, data center managers are now paying close attention to power and space utilization. The industry has responded by creating both the metrics and the tools needed to improve how organizations measure and manage power consumption. Used correctly, these metrics and tools can maximize power efficiency, thereby extending the life of most data centers -- often indefinitely.

This article will help data center managers understand how best to employ these new metrics and tools in five different ways to optimize use of available space, cooling capacity, and power. However, as the old adage goes, you can’t manage what you can’t measure, so you must begin by getting an accurate inventory of your existing data center equipment and taking baseline measurements.

Get Started by Establishing the Baseline

The need to measure actual power consumption accurately and easily has recently been satisfied with a variety of new tools, the most powerful of which are classified as data center infrastructure management (DCIM) solutions. The measurements taken by a DCIM system establish the baseline needed to determine the progress an organization is making toward meeting its power usage effectiveness (PUE) and/or corporate average datacenter efficiency (CADE) goals. DCIM systems are also invaluable in optimizing use of a data center’s limited resources, potentially as part of achieving these efficiency goals.

The typical DCIM solution supports both the industry-standard and popular proprietary protocols used to measure power consumption, such as the intelligent platform management interface (IPMI), Modbus, and BACNET, which means there are no special agents to install or extra wires to run to measure power at the building, circuit. and device level. The better DCIM solutions make the implementation even easier with advanced capabilities such as auto-discovery, capacity planning, building energy management system integration, sophisticated yet intuitive dashboards, and comprehensive reporting.

Reclaim Stranded Power by Optimizing Equipment Placement

Because measuring power consumption was often too difficult prior to the advent of DCIM solutions, IT and facility managers have been forced to calculate loads based on nameplate ratings and/or datasheet specifications. The problem with this practice is that it inevitably produces a too-conservative estimate; the effect is that the actual power utilization is far below the power available, even though calculations show that power is being fully maximized.

Stranded power exists wherever power distribution is matched incorrectly with actual power consumption. Some racks have capacity that remains unused or stranded, while other racks may not have enough, causing circuit breakers to trip. This is where the baseline measurements of DCIM come in. By knowing precisely the idle, actual, and peak power consumed by all equipment, systems can be reconfigured and/or relocated to match the power distribution in all rows and racks.

Extend Cooling Capacity by Turning Up the Heat

The more advanced DCIM solutions also measure environmental conditions -- such as temperature, humidity, and airflow -- throughout the data center. Although this topic is beyond the scope of this short article, the same DCIM modeling tools used to minimize stranded power can also play a critical role in optimizing the placement of systems in suitable hot/cold aisles and even within the individual rack. What-if analysis allows the permutations and combinations of power and cooling considerations to be evaluated easily and accurately to achieve an optimal result.

Today, most data centers operate well below the 80°F (27°C) cold isle temperature that the American Society of Heating, Refrigerating, and Air-Conditioning Engineers (ASHRAE) recommends. The reason for maintaining lower temperatures is a fear of hot spots, but this is no longer considered a best practice for energy efficiency in data centers. The more feature-rich DCIM solutions minimize this risk by taking constant and accurate measurements of the server inlet temperature and adjusting the cooling accordingly, allowing the temperature to rise safely within established limits. This ultimately increases the cooling efficiency and allows more power to be used by IT equipment.

Save Space by Optimizing Server Refreshes

Performance improvements based on Moore’s Law make it prudent to replace servers, storage, and networking systems periodically. Sometimes the change is driven by new features or as part of a major consolidation and virtualization initiative, but determining the optimal time to refresh servers is not as straightforward. The new servers may offer better price/performance, but can their total cost of ownership (including power consumption as a major operating expense) be justified? If so, which old servers should be replaced first, and by which model of new servers?

To help IT managers make such choices more wisely, the EPA created an EnergyStar rating system for servers and other IT equipment. EnergyStar has a fundamental flaw, however -- it does not factor in the age of the equipment, and with an average 2X improvement in server performance every two years, this is a serious shortcoming. In other words, an EnergyStar rating provides no means to assess the energy efficiency of the server itself relative to all servers available today.

To make a fully-informed decision about a server refresh, IT managers are, therefore, forced to make estimates based on performance specifications. As noted, however, the use of specifications can produce inaccurate and misleading results.

To address this shortcoming, Power Assure (the company I founded and where I serve as CTO) developed PAR4, a new method for determining both absolute and normalized (over time) energy efficiency ratings for both new and existing equipment on a transaction-per-kilowatt-hour (kWh) basis. The “4” in PAR4 indicates that four such measurements are made: power-on spike, wave form, boot cycle, and 100% load. These measurements are then used to determine idle and peak power consumption, as well as transactions/watt and annualized rating details. This provides a more accurate means for IT managers to compare legacy servers to newer models, and newer models with one another even if based on different hardware architectures. PAR4 ratings are particularly useful for determining the most power-efficient choice of server(s) during benchmark testing of actual applications.

Conserve Power by Maximizing Server Utilization

The best DCIM solutions also offer a dynamic power optimization (DPO) capability to achieve peak energy efficiency (and help meet PUE or CADE objectives) by migrating from today’s “always-on” practice of operating servers to an “on-demand” approach. DPO solutions work in cooperation with load-balancing or virtualization systems to continuously match server capacity with demand. When the DPO’s real-time calculation engine detects an impending mismatch between anticipated demand and current capacity (whether too little or too much), it automatically informs the virtualization system to make the appropriate adjustments by either powering up or powering down some number of servers, respectively. This process is normally automated using runbooks (standard operating procedures) that outline the specific steps involved during the ramp-up and ramp-down -- from migrating applications to and from available virtual machines to adjusting cooling capacity. The result is far better energy efficiency with no adverse impact on performance or application service levels.

Clemens Pfeiffer is a 22-year veteran of the software industry, where he has held leadership roles in process modeling and automation, software architecture and database design, and data center management and optimization technologies. Before co-founding Power Assure, he served as founder, president and CEO for 10 years at International SoftDevices Corporation, focused on designing and implementing a multi-industry smart process automation platform and integration agents. Pfeiffer holds a MS degree in Computer Science.