In-Depth
Virtualization for High-End Computing Environments
Why aggregation is ideal for high-performance computing
By Shai Fultheim
In the past five years, the server success story that stands out strongest is the high-performance computing (HPC) segment. According to IDC, this segment has grown at a 20 percent cumulative annual growth rate in recent years, going from less than 10 percent of the overall server market to more than 20 percent, and there's no sign this growth will slow. HPC has grown from being a small niche used mainly by large national and defense laboratories to an indispensable mainstream product many commercial companies depend on to innovate, compete, and survive.
HPC users play a key role in designing and improving many industrial products -- from automobiles to golf clubs -- as well as industrial and business processes such as finding and extracting oil and gas or modeling complex financial scenarios. They enable a company to create, evaluate, and modify large-scale digital models of products and processes. The result of using computers in the design and engineering phases includes an accelerated time to market, improved products and processes, and significant cost savings.
HPC applications require large memory footprint and/or large numbers of processors. These requirements are being addressed today in two deployment models:
Scale-Up: In this approach, the hardware is resized to fit the resources (processors, memory, and I/O) required by the application. An enterprise procures a large, shared-memory system, which is usually RISC-based and thus carries a high cost and uses proprietary designs, resulting in possible vendor lock-in. Besides being the only way to achieve large memory, the key advantage of this approach is simplicity. Large systems have a single operating system and benefit from large, contiguous memory and unified I/O architecture.
Scale-Out: In this approach, the application uses middleware software that allows it to run on multiple separate systems. The application is being partitioned by the middleware to allow it to fit the hardware. Many applications in HPC have been written to support the Message Passing Interface (MPI), middleware that allows such deployment. The key advantage of this approach is its ability to leverage low-cost, x86 industry-standard servers and avoid vendor lock-in. The major disadvantage is the infrastructure complexity resulting in high installation and on-going management costs. IT must manage multiple operating systems, distributed I/O (cluster file system deployment), cluster interconnect, and application provisioning (load balancing, job scheduling, and resource management).
A new virtualization paradigm addresses the shortcomings of both of these approaches and preserves the benefits. It is called aggregation.
Aggregation makes multiple physical systems appear to function as a single logical system. The building blocks for this approach are the same x86 industry-standard servers used in the scale-out (clustering) approach, preserving the low cost. In addition, by running a single logical system, customers manage a single operating system and take advantage of large contiguous memory and unified I/O architecture.
Benefits of Aggregation
There are five major benefits to this technology.
Aggregation creates a large memory system: For workloads that require a large contiguous memory, customers have traditionally used the scale-up approach. Aggregation provides a cost-effective alternative to buying expensive and large proprietary shared-memory systems for such workloads. It lets an application requiring large amounts memory to leverage the memory of multiple systems and reduce the need to use a hard-drive for swap or scratch space. Application runtime can be dramatically reduced by running simulations with in-core solvers or by using memory instead of swap space for large-memory footprint models.
Aggregation provides a cost-effective virtual x86 platform with a large shared memory that minimizes the physical infrastructure requirements and can run both distributed applications, as well as applications requiring a large memory footprint at optimal performance on the same physical infrastructure.
Aggregation supports compute-intensive, shared-memory applications: For workloads that require a high number of cores coupled with shared memory, customers have traditionally used proprietary shared-memory systems. Aggregation provides a cost-effective x86 alternative to these expensive and proprietary RISC systems.
Aggregation technology combines memory-bandwidth across boards (in contrast to the traditional SMP or NUMA architectures in which memory bandwidth decreases as the machine scales). This enables solutions based on aggregation technology to show close to linear-memory bandwidth scaling, thereby delivering excellent performance for threaded applications.
Aggregation brings ease of use to existing environments: For workloads that otherwise require a scale-out approach, the primary value of aggregation is the ease of managing a single system instead of the complexities of managing a cluster. A single system removes the need for cluster file systems, cluster interconnect issues, application provisioning and installation, and updates to multiple operating systems and applications. Using a single operating system instead of one per node significantly reduces installation time, initial equipment and software costs, and management costs.
Aggregation uses a simplified I/O architecture: I/O requirements for a scale-out model can be complex and costly, involving networked storage with accompanying expenses related to additional HBAs and FC switch infrastructure. Aggregation technology consolidates each individual server's network and storage interfaces. I/O resource consolidation reduces the number of drivers, HBAs, NICs, cables, and switch ports, as well as the associated maintenance overhead. The user needs to purchase, manage, and service fewer I/O devices and benefits from increased availability, resiliency, and run-time scalability of I/O resources.
Aggregation improves utilization: Even in large cluster deployments in data centers, it makes sense to deploy aggregation, because fewer larger nodes mean less cluster complexity and better utilization of the infrastructure thanks to reduced resource fragmentation. For example, in the financial services industry, where organizations need to run hundreds or thousands of simulations at once, a common deployment model involves hundreds of servers, each running several simulations. Each cluster node is running a single application at 80 percent utilization. By using aggregation to create fewer large nodes, every four aggregated systems can run another copy of the application, leveraging the underutilized resources and increasing utilization by 25 percent.
Summary
Aggregation is ideal for the HPC segment. It works well for compute-intensive applications (numerical and engineering simulations) and memory-intensive applications (very large modeling and business intelligence). The benefits of this approach include cluster consolidation and infrastructure optimization (reducing the number of managed entities), improved utilization (reducing data center fragmentation), lowered cost of physical infrastructure (including traditional SMP systems and unified I/O), and greener computing. The result: fewer systems to manage and a large shared-memory system at industry-standard cluster pricing.
- - -
Shai Fultheim is the founder and CEO of ScaleMP. Shai designed and architected the core technology behind the company and is now responsible for its strategy and direction. He has more than 15 years of experience in technology and business roles, including several years on the IT end-user side. He can be reached at shai@scalemp.com.