Enterprise Grid Computing: Why the Buzz? (Part 1 of a 7-part Series)
Interest in grid computing has grown quickly. In this first part of a seven-part series, we begin an in-depth look at the technology by examining its popularity.
Engineering workstations became very popular in the 90s. The responsiveness of a fully dedicated resource (as opposed to waiting for a slot in a batched mainframe environment) was extremely attractive. Unfortunately, this environment was less than perfect.
Operational experience indicated that the systems’ good response times was because the system was idle most of the time during daytime hours and not used at all after hours—not a very good production use for RISC workstations whose cost could range from $50,000 to $200,000. In addition, as fast as these machines were, once CPU utilization reached 100 percent, they could not work any faster. This was not fast enough for some applications.
The concept of grid computing was first developed to address these issues: increasing the utilization factor of expensive equipment and reducing the execution time of computation-intensive programs beyond what was possible with a single workstation by executing the program in parallel, that is, by applying more than one processor to a given job. In theory, a 4-CPU workstation could finish in a quarter of the time of the time it would take applying one CPU.
Beyond that, if parts of the job running a program could be spread out among two workstations, the execution time could be halved again. This is not significant for short programs that finish in a fraction of a second, but some computation-intensive programs might take weeks to finish a single run. Most people work under deadlines and can’t wait that long.
Lengthy jobs (where computations last for several hours or even days and weeks) are more common in high performance computing than in the enterprise, where performance is measured in thousands of transactions per second. The acceleration of long jobs is essential to some deadline-driven organizations, including data mining runs that must finish overnight, bond and securities calculations in the financial services industry, movie rendering in the film industry, or oil exploration analysis in the energy sector.
The need to increase utilization factors of expensive equipment, share resources, and reduce run times led to the development of grid computing concepts. The dynamic behind the development of grid computing is not significantly different from that in the infrastructure developed to share a mainframe resource in a large organization. Somewhat ironically, the desire to harvest idle cycles from swarms of workstations (also known as cycle scavenging) led to the return of a variant of a batched environment where long jobs are queued up and run as available resources permit. For instance, processors are organized as pooled resources much in the same way as a maître d'hôtel allocates tables in a restaurant to arriving patrons.
We will elaborate on some of these ideas in this seven-part series. Now that the Y2K dragon has been slain, it is becoming clear that grid computing is useful for enterprise applications, not just for engineering and scientific applications. We will focus on enterprise applications, since these are the least documented.
Technology transitions are taking place (or will likely occur within the next five years) that will lower the barriers to deploy, maintain, and run applications on computer grids. Most of the literature dwells on performance gains and application capabilities enabled by the new technologies. Perhaps a more interesting exercise is to take these transitions to their logical conclusions and speculate as to what new business models will become feasible. This knowledge will allow us to explore how businesses can benefit from this emerging technology and design strategies for maximizing these benefits for organizations contemplating grid deployment.
While the outcome cannot be predicted with certainly, the process is intrinsically useful to develop a meaningful strategy to address today’s issues and to determine associated business decisions down to the dollars requested in the next budgeting cycle.
The Physical Grid
The most visible part of a computing grid is the hardware on which it runs. For this discussion, we will use a simple three-level abstraction to describe grid hardware: nodes, clusters, and grids.
- A node is a computer in the traditional sense: a desktop or laptop PC, or a server in any incarnation: a self-standing pedestal, a rack module, or a blade, containing one or more CPUs
- A cluster is a collection of related nodes
- A grid is a collection of clusters
We call this abstraction the Visible Grid. We will extend this model in future articles in this series. Abstractions above the grid level include the software environment that makes grids run and the business considerations that drive grid procurement and operations. At the other end, we cover the architecture and technical aspects of how nodes are constructed.
A grid is essentially a set of computing resources shared over a network. A grid differs from more a traditional distributed system, such as a classic n-tier system, in the way its resources are laid out and utilized. In a conventional environment, resources are dedicated: a PC or laptop has an owner and a server supports a specific application. In a grid, the resources are pooled, fungible, and virtualized. The resources to run a grid job are allocated from an anonymous pool of processors. They are fungible: one processor resource looks like any other, and the user does not know which individual processor will be allocated to a job.
Pooled usage introduces complexity. A grid becomes useful and meaningful when it both encompasses a large set of resources and serves a sizable community. These resources are allocated dynamically according to an enterprise’s established policy.
With the enormous flexibility and reliability afforded by computing grids, it may seem surprising that they not more pervasive today. Likely it’s because grids are a combination of hardware and business processes requiring a cultural shift in an organization.
From a cost perspective, it might be attractive to share resources across organizations, including different companies, including those in different countries. Doing so implies additional overhead to ensure data integrity, security, and resource billing. The technology to support these functions is still evolving. Interoperability is essential to the capability to coordinate global resources. Considerable progress has been attained in the past few years through the work of organizations such as the Global Grid Forum and OASIS, and the emergence of technologies like Web services that have interoperability as a core goal.
Next week: We explore the value of resource pooling with a transportation analogy.
Enrique Castro-Leon an enterprise architect and technology strategist at Intel Solution Services, where he is responsible for incorporating emerging technologies into deployable enterprise IT business solutions.