Enterprise Grid Computing: Three Deployment Strategies (Last in a 7-part Series)

The advance of grid computing will bring in new capabilities, cut costs dramatically, enable increasingly ambitious projects, and offer more advanced capabilities to customers. While the implementation details are different for each industry, some strategies are common to all. We explore three strategies.

The advance of grid computing into industries currently outside the sphere of this technology will bring in new capabilities, cut costs dramatically, enable increasingly ambitious projects, and offer more advanced capabilities to customers. While the implementation details are different for each industry, some strategies are common to all.

As a first stage toward the implementation of grid computing in their business models, most businesses should explore the viability of deploying dedicated hardware resources rather than attempting to scavenge spare cycles from existing equipment. Initially, a homogeneous environment will simplify the effort and minimize the amount of application optimization required to run efficiently on the grid. Once the first-generation grid environment is in place, the organization can move toward refining its applications to take better advantage of the environment and toward incorporating future technologies as they become available.

Short-Term Deployment Strategies: Hardware Investment

For shops contemplating a first deployment of grid technology, despite its purported cost savings, cycle scavenging is probably not the deployment mode to try first. First consideration should be given to the deployment of dedicated grid resources where the constituent nodes are homogeneous. For example, semiconductor designers have been using grid-like infrastructures for the past few years. The goal early on was to increase the utilization of the expensive RISC workstation cycles of that time. Now, the cost of hardware has come down by two orders of magnitude or more. A $1,000 desktop today is more powerful than a $150,000 workstation was then. The hardware-acquisition cost today is a small fraction of the total cost of ownership. Larger components are the cost of the software stack, including applications, both in terms of acquisition cost and the cost of maintenance over the system’s life.

Because it is hard to quantify, another factor seldom incorporated into TCO considerations is the quality of the user experience. Far from being a secondary consideration, however, user experience correlates to worker productivity, which has an impact on the organization’s bottom line. In the worst case, a lowest-cost system still yields zero ROI if the targeted audience refuses to use it. A dedicated, homogeneous environment makes it easier to run parallel applications. Some of these applications will only run in homogeneous environments; others will run at the speed of the lowest-performing node. Faster nodes are left waiting until the stragglers catch up. If the owner of one of the workstations on which the application is running decides to take it off the grid, the entire run may hang.

Applications can be optimized to run in a heterogeneous environment, but optimization takes time and money, thereby increasing the labor component of the TCO or introducing project delays until the ISV incorporates the optimizations. The user community may see this optimization as a hurdle and opt out of grid computing.

Even if a shop starts with a homogeneous environment, the installation will, over time, gravitate toward becoming a heterogeneous system. As the system is upgraded, more advanced nodes will be incorporated. At some point, especially for large companies, additional grids or clusters will be added to the original grid. These additional grids may come from consolidation, mergers and acquisitions, and deployments in different division and geographical regions within the company. These nodes are, of course, different from the originals, and, by definition, make the system heterogeneous.

Medium-Term Deployment Strategies: Application Focus

A prime medium-term consideration is the harnessing of application parallelism. Parallel applications running over networked nodes may experience performance bottlenecks at the network level. One approach to overcoming these bottlenecks is to re-host the applications in a cluster. A cluster has an interconnect that has faster bandwidth and shorter latency than Ethernet-based networks.

Also medium-term, applications will need to be optimized to take advantage of multi-level data hierarchies within a node: the CPUs in an SMP node, the cores in a multi-core CPU, and multiple levels of cache.

For some classes of problems, multiple cores and large caches are beneficial. An example in the high-performance computing (HPC) space is represented by dense linear algebra problems where data size grows proportionally to the square of problem size, and the number of operations grows proportionally to the cube of problem size. Dense linear algebra algorithms are designed to load chunks of data into the CPU’s caches and to flush results to memory in a pipelined fashion. Large-cache cores allow a large number of operations between reloads, whereas multiple cores can ensure that these operations are done fast. These capabilities will come for free, in the sense that the CPU fraction of the TCO will likely remain constant or shrink slightly. However, these gains require hard work and a significant investment from all players in the ecosystem. Ensuring computational balance between the cores in a CPU, the CPUs in a node, the nodes in a multi-level cluster, and clusters in a grid, while maintaining logical consistency across the entire system, is the architectural equivalent of juggling five balls.

For very large data sets, the same dynamic between memory and cache storage also applies between disk storage and memory. In this case, instead of megabyte-sized buffers between cache and memory, memory can be used as a gigabyte-sized cache for terabyte-size data sets.

Sixty-four-bit addressing can be useful in two ways. First, the larger addressability over 32-bit addressing makes it possible to fit large data sets (tens of gigabytes) entirely in the physical memory of a cluster. Being able to do so has a significant impact in application design. For some HPC applications, a data set that does not fit in physical memory can be run with an application that has “out of core” capability, which is essentially an application-optimized virtual memory system. An out-of-core version of an application can be 10X more expensive to develop than the plain vanilla version. Developers need to be versed in OS architecture in addition to specific application-domain skills.

There is a significant gain in efficiency in computations done against a very large database when the entire database fits in memory. One such database is associated with the human genome project; the human genome consists of 30,000 genes and 3.2 billion base pairs. Roughly assuming the use of one byte to encode a base pair suggests a 3.2GB dataset, which pushes the limits of 32-bit addresses, since the entire 4GB space is not necessarily available for data addressing.

The second advantage afforded by 64-bit addressing is that for applications whose number of operations grows faster than data sizes—such as the linear algebra example previously mentioned—the ratio of computation to I/O increases, making the system run more efficiently overall. Storing a database in memory is an example of caching and data replication traded off against the latency and limited bandwidth of accessing a data repository across the globe. Computational genomics problems are especially amenable to this kind of treatment.

One behavioral trait of applications unchanged in the past 50 years is the locality of reference: given a large address space, a program is likely to reference a minute portion of that address space. This principle applies to both code and data, and it is why caches work. For instance, if all the code associated with a loop fits in the cache, potentially the entire code segment can be loaded into cache, and for running this code segment, memory is referenced exactly once when loaded into the cache for the first time, whether the loop executes 1,000 times or a million times. Most applications exhibit this desirable locality behavior.

The portion of the address space referenced by a program over a certain interval is defined as the working set for that interval. It is also interesting to note that this “lumpy” behavior happens on different time scales concurrently, whether the interval is seconds, minutes, or even hours. This behavior allows for the mapping of working sets for different timescales to specific elements in the grid architecture. For instance, sub-second working sets are better handled at the cache level, while an application can spend a few minutes between flushing and reloading memory buffers from disk. Transactional workloads typical of enterprise applications also exhibit a well-defined locality of behavior, running a relatively small portion of code and updating a few records in a database.

One way of achieving efficiency in a grid environment is to make an application self-adjusting with respect to the application’s working set at each level of abstraction in the system and at some interesting time constant. This is possible because optimizing for one level of abstraction can be done without undue interaction or interference with layers up and down. For instance, the optimization of cache utilization is embedded in a library routine, or perhaps in the code generated by compiler. These optimizations rarely affect the way I/O buffering is managed. Applications can be written to allow for metadata exchange, where the host can pass information such as the available physical and virtual memory, the number of CPUs per node, cache size, and memory performance parameters such as latency and bus bandwidth. The application can then adjust the operating parameters for a particular run, including how arrays are allocated, their sizes, and the buffering and I/O strategies for a particular run.

A useful way to look at grids is as a composition of services and business processes to attain specific business goals. Hardware expenditures may not map very well to ROI, because doing such analysis would be difficult without considering the intervening logical layers. For instance, without the intermediate analysis, it would be very difficult to explain why using four-way servers is more desirable than using two-way servers.

Long-Term Deployment Strategies: Harnessing Future Technology Transitions

Contemplating grid deployments from a purely technological perspective, perhaps as a means of speeding up current tasks and processes, is likely to result in missed opportunities. It will be difficult for CIOs to justify grid deployments purely on the basis of the technological advantages that it confers, although these benefits might be substantial to some stakeholders. The driver for success will be tangible, quantifiable business benefits stemming from grid deployment to critical organization stakeholders. Technical arguments alone do not paint the complete picture.

Because of the emerging nature of grid technology, service organizations with prior experience in grid deployments play a valuable role in the ecosystem, accelerating grid adoption by sharing their experience. These service organizations can come in many forms, including in-house or external expertise. Outsourced expertise can come from pure-play consulting houses or from product-based companies. Each option has pros and cons. A detailed discussion of the subject is outside the scope of this article.

Successful deployment breeds additional success, and the sharing of prior experience can be a critical success factor. Conversely, organizations venturing out on their own can easily step into blind alleys with their first attempts. A negative initial experience can deter further attempts for months or years. Failures might be unrelated to inherent limitations of grids, but without the proper expertise, it may be difficult to tell. In such a case, potential benefits to the organization are not realized.

Consider the Possibilities

As an emerging technology, most grid applications or even “killer” applications have not been invented yet. Consider, for instance, whether the now-ubiquitous wireless access point could be enhanced to become part of a mesh-oriented sensor network, a particular case of an embedded grid. Only a few of these devices would be wired, functioning as gateways into the wired Internet. The rest of the access points would be truly wireless, talking to neighboring access points. The devices could be fitted with environmental sensors that, for example, could act as fire alarms or function as relay stations for VOIP calls. Users with multi-modal communication devices could use VOIP for intra-company calls, and they could use the regular cellular network when no other medium was possible. The system would take care of managing multiple phone numbers, international access codes, credit card access codes, or IP addresses to reach a certain person.

Such creative implementations are likely to become more prevalent in the next several years, and grid computing will generate tremendous benefits to the companies that deploy such solutions, as well as to the service.