Cellular MultiProcessing: An Introduction

The lives of Cellular Multiprocessing, as well as related technologies asymmetric multiprocessing, symmetric multiprocessing, clustering, massively parallel processing, cache coherent non-uniform memory access, are examined.

The Internet is a great place to do research on a given topic. For example, key in"+symmetric +multi +processing +SMP" into the AltaVista Web search engine, andyou get 2,859 matches. Similarly, typing "+massive +parallel +processing +MPP"yields 783 matches. The string "+cellular +multi +processing +CMP" produces 615matches; and of the first 10 hits, seven reference Unisys. To fully understand CMP, it isworthwhile to take a look at some of these other related technologies first.

Searching for Scalability

With an ever-growing number of users accessing applications of ever-increasingcomplexity, and with database size seemingly exponentially expanding, the search forscalability becomes paramount. These power users push the limits of known systemcapacities and speeds and collectively drive high transaction rates. Further propellingscalability requirements are enterprise data warehouse applications and a recent bloomer,the World Wide Web, itself related to high-volume online transaction processing (OLTP).

Scalability implies handling growth in the data managed, the end user workloadsupported, and the types of applications and functionality provided. Ancillary toscalability is reliability, availability and flexibility. It also means being able toapply additional resources dynamically as the need arises, minimally impacting useravailability.

Hitherto, the only way to get true scalability was to buy a mainframe-class system. Fora while, headlines were that mainframes were dead. But as long as massive scalabilityrequirements exist, such as for the NASDAQ or the U.S. Internal Revenue Service, largemainframes represent just about the only architecture that fits the bill.

Applications are inherently progressive. Most start out on a satisfactorily- (or oftenunder-) sized uniprocessor system, which is invariably quickly outgrown. Years ago, toexpand required an entirely new, larger system. Now, multi-processor systems easilyfulfill that need, coming in up to six flavors: asymmetric multiprocessing, symmetricmultiprocessing, clustering, massively parallel processing, cache coherent non-uniformmemory access (ccNUMA) and the Unisys Cellular MultiProcessing.

Asymmetric Multiprocessing

Packing multiple systems into a single cabinet is nothing new, where each system hasits own processor and memory. And packaging multiple processors together that shareresources is also not new, being built by many server vendors.

Asymmetric multiprocessing was an early attempt at multiprocessing. This is wheremultiple processors do not have identical access to system resources. For example,one processor might be the one to handle all system interrupts, whereas the otherprocessor does not receive any.

Because balancing work loads by the operating system becomes increasingly difficult asthe asymmetry increases, developers of general purpose operating systems prefer symmetricmultiprocessing.

Symmetric Multiprocessing

Packaging multiple processors together so they share the same memory, system bus (orswitching technology) and I/O subsystem is called symmetric multiprocessing (SMP). Thismechanism eases the operating system load-balancing problem. Many operating systemssupport SMP, including Microsoft Windows NT, Unix and Novell NetWare.

Complex coordination hardware is included to ensure the multiple processors do notoperate on the same piece of data simultaneously. And the coordination hardware operatestransparently to applications. This makes application development easy and similar tosingle processor application development.

The interconnection hardware operates over a system bus or switching mechanism thatinterconnects the processors to the common memory. Unfortunately, because the contentionfor the bus increases as the number of processors goes up, the bus becomes a bottleneck.This has a major implication.

If a single processor operates at a speed of 1.0, two single processor systems wouldoperate at 2.0, and four operate at 4.0. Consider an analogy: if it takes a programmer oneyear to develop a program, does it take 12 programmers one month?

Sometimes – actually, very seldom – in an SMP architecture, when only veryfew processors are added, 100 percent of the processor speed is preserved and utilized.But as the number of processors increases, bus contention increases almost exponentially.Consequently, when most SMP systems reach 12 processors, adding other processors doesalmost nothing to increase the processing power of the system. Some SMPs claim to scale upto 64 processors, but system bottlenecks are a serious consideration.

Clustering

To overcome the inherent limits of SMPs, clustering was devised, where multiple SMPcomputers are tied together as a single logical system. Usually, two-to-four systems,called nodes, are joined via a high-speed networking interconnect. This networkinginterconnect is usually of lower bandwidth and slower speed than an SMP bus or switch.Clustered servers are sometimes referred to as "loosely coupled" servers.

Clustering provides the obvious advantage of increasing the computing power over whatsingle SMP systems can accomplish. Sometimes even more importantly, clustering can providea fail-over resiliency that a single system often cannot provide; when one system fails,the other can take over the processing load of the two – albeit in a degraded mode.Two or more systems thus connected make a powerful scaleable and resilient combinationdifficult to match with a single system architecture.

As with an SMP bus architecture, the clustering interconnect eventually becomes abottleneck. But this is trivial compared to the problem that non-shared system resourcesintroduces. To picture this, imagine conducting an orchestra in a concert hall –daunting for most of us, but doable. Now imagine every orchestra section in a separaterehearsal room – first violins in one room, horns in another, and so on. You can seethem and they can see you, but they cannot see or hear each other. The former is SMP, thelatter clustering.

Since each node has its own processors, memory, and bus, the main way the nodescommunicate is by forcing that awareness into the application. The developer must code theapplication such that it sends messages across the interconnect containing data, statusand related communication requests.

Microsoft Wolfpack

Microsoft is entering the clustering arena with its Wolfpack clustering enhancement forWindows NT Server. With it, Microsoft intends to address fault tolerance (availability)and scalability. According to Microsoft, existing clustering solutions are complicated,hard to configure, and built with expensive and proprietary hardware. Microsoft Windowsclustering will be based on open specifications to run on industry-standard hardware.Everything we have come to expect of Microsoft software, such as easy-to-use wizards, willbe included.

Wolfpack does not eliminate the need for other high-availability technologies, such asRAID disk, UPS power backup and duplicated hardware, such as power supplies and networkinterface cards.

Microsoft has also planned tools for application developers to cluster-enable theirapplications. However, if an application is "well behaved," it can run withoutbeing cluster-enabled.

Well behaved means the application keeps everything it needs to restart on a diskaccessible from another clustered system, and its clients can satisfactorily handleservice pauses of up to a minute. Most commercial applications already satisfy these twocharacteristics.

Massively Parallel Processing

To address the shortcomings of SMP, massively parallel processing (MPP) systems weredeveloped. MPPs help particularly in applications such as huge data warehouses. They aresimilar to clusters in that the nodes of an MPP are connected by an interconnect. Thishelps them be fault tolerant. When one processor goes down, another can take up the slack.MPP systems are also modular and capable of being upgraded.

There are significant differences between MPPs and SMPs and clustering. Each node in anMPP is usually a uniprocessor rather than SMP. The interconnect of MPPs can handlehundreds of nodes, compared with the limited number of nodes clusters usually contain orthat are in SMPs. And the interconnect’s bandwidth is much greater than that of acluster, and designed to be highly scaleable.

On the plus side, MPPs have the greatest scalability. On the minus side is theprogramming complexity required to coordinate the nodes of an MPP and to manage theprocessor interactions – magnified by their great number.

To fully utilize an MPP’s massive parallel capabilities, a database must bedesigned and built to take advantage of it. Specifically, the database must be able toperform database functions in parallel. Typically, a database is chosen to match theselected MPP system and vendor.

Cache Coherent Non-Uniform Memory Access

A ccNUMA architecture is constructed using SMP nodes, each running its own copy of theoperating system, connected into a system that is scaleable larger than bus-based SMPs,just like MPPs can be scaled. The nodes are connected, but connected using an interconnectwhose speed and physical media vary greatly.

All things being equal, memory nearest a processor can be accessed faster than memoryresiding at a greater distance. For example, the cache included on a 486 or Pentiumprocessor chip is accessed much faster than the main memory on a PC’s motherboard.Because the closeness of the memory affects the speed of its access, that is where"non-uniform" comes from.

A moderate amount of non-uniformity does not impact a system. However, as the ratio ofremote-to-local memory increases, this drives programmers to switch to sending messagesrather than simply using shared memory.

This is one of the main distinguishing characteristics of ccNUMA: how it scales andperforms when processors are added to the system, and then how they allocate memory.Similarly, I/O travels along the interconnect unless all the nodes have their owncontrollers and devices, which creates a different, more complex set of issues.

An application written to execute on one ccNUMA system may need to be rewritten tohandle memory differently on another configuration. The bottom line is that not all ccNUMAsystems are equal.

Cellular MultiProcessing

With these other technologies as background, let’s turn our attention to CellularMultiProcessing (CMP). In May 1998, Unisys announced the CMP server architecture based onIntel Pentium II Xeon and future Merced processors. It supports applications that rununder both NT and UnixWare operating systems.

In the original CMP press release, the Unisys chairman and CEO, Lawrence A. Weinbach,was quoted: "This is our bid for leadership in the market for enterprise-classservers based on Intel technology. CMP, together with information services, is our enginefor customer base expansion and revenue growth." Considering his remarks, you can seehow important CMP is to Unisys.

Because CMP is based on Intel and runs Windows NT, it further strengthens theUnisys-Microsoft relationship already established. By applying to CMP the heritage ofUnisys mainframes – high availability, high transaction rates and large databases– Unisys sees the CMP platform as an important step in bringing mainframe-classcapabilities to a Windows NT Server environment.

The word cellular, while perhaps a little abstract, describes how computing elementscan be broken down: into individual cells, or partitions. These cells can work togetherwith other cells, that may be operating under different operating systems as well as withwidely varied applications.

CMP is the foundation for the next generation of Intel-based server technology atUnisys. It is the first to support partitioning while still providing the scalabilityadvantages of SMP servers and the high availability of clustering architectures.

The architecture eliminates the bottlenecks associated with traditional SMP busarchitectures by using a mainframe-class crossbar switching technology. This allows a CMPsystem to have from four to 32 processors, up to 64 GB of shared memory, and up to 96 PCIperipheral slots for an extremely high-volume I/O bandwidth.

The architecture is optimized for what heretofore has been considered mainframeapplications: data warehousing and high-volume online transaction processing. It promiseshigh system availability, with an integrated maintenance subsystem used in Unisysmainframes, that has a call-home feature for reporting status conditions back to a servicemonitoring center. That maintenance subsystem can itself be replicated to avoid it being asingle point of failure.

Let’s review some of the advantages that CMP has over pure SMP and clustering.

CMP, SMP, and Clustering

CMP is unique in its partitioning capabilities. The CMP architecture allows a system tobe configured as a single 32-way SMP system or into multiple cluster combinations of up toeight partitions. Each partition is then capable of running heterogeneous operatingenvironments and applications simultaneously. These partitions can be adjusteddynamically, operating system permitting, by the administrator as business conditionschange.

A huge CMP advantage is its shared memory. This enables server partitions to transferinformation at memory speeds, rather than at a slower, conventional, clustering networkinterconnect speed. Since standard Windows NT APIs are supported, the nature of the CMPsystem is hidden from the application, requiring no changes to the application.

One of the challenges of clustering is administering multiple interconnected, thoughseparate, systems. Administration is immensely simplified for a CMP platform, since it isa single system, with a single console, reduced footprint, and, ultimately, reduced totalcost of ownership. For example, we have all seen that networking is a notoriouslydifficult environment to administer, which CMP virtually eliminates, when compared withclustering.

Clustering’s fail-over ability is also provided by a CMP system. In fact,fail-over is even easier with a CMP because of its shared memory. However, fail-over canonly be legitimately available when that single system provides fully redundant andresilient hardware, which is the case with CMP.

CMP Examples

CMP is being targeted for OLTP, decision support, data warehousing, multimedia andelectronic commerce applications. Widely used databases, such as Oracle, Informix, Sybase,and SQL Server will be supported on a CMP system. And because a CMP platform is builtusing industry standard off-the-shelf processors and an open architecture, it supports allthe other popular available applications.

The widespread enterprise resource planning (ERP) software from SAP or PeopleSoft wouldbe ideally suited to run on a CMP. As an example, CMP partitioning allows all phases ofclient/server to be consolidated onto a single server: development, testing, andproduction can all run in separate partitions, yet on a single manageable CMP system.Partitions could be adjusted as demands change. For example, as development cycles downand production scales up, partitioning could be suitably balanced.

Ingo Hoffmann, Director of the Worldwide Microsoft Alliance for SAP AG reinforces this."We believe that Unisys is going in the right direction and is responding well to themarket’s requirements for a Windows NT and Intel platform supporting trueenterprise-class applications, such as SAP."

As another example, one partition could have an application tuned for optimum databaseperformance, and another tuned for optimum application performance.

Another example is to run OLTP or database query processing during the day and thenreconfigure the system to run optimally in batch mode overnight.

And yet another use is allowing clients to migrate applications. This can be from onerevision of an operating system to a later one, from UnixWare to NT, or from 32-bitapplications to 64-bit at their leisure, potentially running them in parallel.

According to D.H. Brown Associates, "the CMP project provides the technical basisfor an excellent roadmap and will meet a number of industry needs." CellularMultiProcessing "promises a major leap forward." Check the Unisys web site forrelease information. And watch for a future Unisphere article that will delve intothe technical details of CMP.


ABOUT THE AUTHOR:
Charlie Young has been with Unisys 23 years, and is Director of U.S. Network EnableSolution Programs in the Global Customer Services (GCS) organization. His biography can befound in the current editions of several Marquis Who’s Who: in the East, America, theWorld, Science and Engineering, and Media and Communications. Contact Charlie at charlie.young@unisys.com.