In-Depth

Grids: Large BI Data Sets, Low Price

Grids offer an inexpensive alternative for processing terabytes of BI data

Although so-called “grid” computing hasn’t yet made the transition from academic darling to must-have enterprise technology, some vendors are already talking up the potential convergence of computational grids and business intelligence (BI) as the most natural of technology fits.

Many BI professionals are still asking, "What’s this grid computing technology, anyway?”

“Grid” computing describes an architecture in which systems are connected to share computing resources, forming a so-called “computational grid.” Grid computing has had its proving ground in a variety of highly successful public computing projects, including the University of California Berkeley’s SETI@Home distributed computing project (http://www.setiathome.com).

As technologies go, grid computing is a mature one. Most of the big grid computing players – Avaki Corp. (formerly, Applied Metacomputing), Entropia Inc., Parabon Computation Inc., Platform Computing Inc., Porivo Technologies, Sun Microsystems Inc., TurboLinux and United Devices, among others – first began marketing their solutions three or more years ago. Since that time, they’ve been joined by established vendors—such as Hewlett-Packard Co., IBM Corp., Oracle Corp. and Sun Microsystems Inc.—that have introduced grid computing initiatives.

From the beginning, grid computing has been largely an academic exercise. In enterprise environments it has been confined to the high-performance technical computing space. Over the last twelve months, however, major vendors such as HP, IBM, Oracle and Sun have been pushing grid computing products and services designed for specific vertical markets, such as aerospace, automobile manufacturing, financial services and government.

A Natural Fit for BI?

Advocates say that grid computing is an ideal technology for applications that involve very large data sets—such as the SETI@Home project, which analyzes data collected by radio telescopes for weak signals that could provide proof of alien intelligence. SETI@Home works by parceling out data to hundreds of thousands of users, who download a portion and analyze it on a client module that runs on their computer when it’s not in use. Once complete, the results of their analysis are uploaded to a centralized server.

Not surprisingly, some vendors argue that grid computing is also a natural fit for BI applications, many of which typically work with large data sets. When customers ask what grid computing, "we try to explain to them that you’ve already been dong it, either by parallel processing or distributing workloads across a network,” comments Tho Nguyen, program director of data integration with SAS Institute Inc. “These things have been utilized already, but grid computing gives [customers] a more efficient way to utilize them. We’re finding that some customers are coming to us because they understand the potential value here.”

How does grid computing enable greater efficiencies than parallel or distributed processing? For starters, grid computing isn’t strictly server-centric. Instead, it proposes to exploit the un-utilized or under-utilized power of all computing resources in a network environment—including desktop PCs.

The typical desktop PC has changed a lot over the last 20 years: The term “PC” may once have described a low-end machine powered by an 8-MHz 8088 or 80286 microprocessor and outfitted with scanty memory resources, but today’s “PC” is more properly an entry-level server. That’s because it often sports a 1-, 2- or even 3-GHz processor, hundreds of gigabytes of hard disk storage and – frequently – a gigabyte or more of memory under its hood.

“There’s an opportunity there to take advantage of that computing horsepower, which is underutilized during the day and which is typically unutilized during off hours,” Nguyen argues.

SAS has put its money where its mouth is, introducing SAS/Connect, a grid computing technology that can be deployed along with its SAS 8 and forthcoming SAS 9.1 BI suites. SAS/Connect is based on a feature called MP Connect, which allows multiple SAS sessions to run in parallel, each performing a portion of a larger application. SAS/Connect’s MP Connect feature allows SAS users to take advantage of large SMP systems and distributed a workload to an unlimited number of workstations across a network.

“[SAS/Connect] enables the grid computing technology by identifying the computers in the network and going out there and using them,” Nguyen explains. “We’re offering this to customers who have a need today, but we plan to evolve it and add more intelligence to it within probably the next six to twelve months, working with existing customers as well as potential customers to really identify what features they most want to see.”

Vendors Climb Aboard

Vendors are taking stock of grid computing’s potential strength as an enabling technology for BI. Last year, for example, grid computing pure play Platform Computing established an original equipment manufacturer (OEM) relationship with Cognos Inc., under the terms of which it agreed to OEM Cognos' PowerPlay OLAP tool along with Cognos’ Upfront portal. Platform markets a line of grid-based products and services, including Platform Intelligence, an enterprise performance management solution. At the time, the grid computing start-up planned to integrate PowerPlay with its Platform Intelligence suite and said that it would tap Cognos’ Upfront portal to provide a Web-based front-end to performance management information.

More recently, grid specialist Avaki released a new version of its grid computing software—Avaki Data Grid 4.0—that boasts support for enterprise information integration (EII) in distributed environments (http://www.tdwi.org/research/display.asp?id=6779&t=y). Avaki says the newest version of its flagship grid computing platform lets customers provision, integrate and access data distributed across heterogeneous systems.

Avaki may not be a well-known player in an EII space populated by IBM and a host of point players, but at least one prominent analyst—Wayne Kernochan, managing VP of platform infrastructure for consultancy Aberdeen Group—has said that Data Grid 4.0 “allows large global enterprises a way of scaling EII software across multiple servers and extending the scope of the data accessed across geographically distributed data sources and outside of an organization.”

In cases in which Data Grid 4.0 is used as a complement to EII solutions from leading vendors, Kernochan argues, Avaki’s grid-based approach confers other advantages. “In these cases, Data Grid should deliver all of the typical benefits of EII, including reduced administrative and programming costs and the ability to leverage proprietary competitive-advantage information now scattered in various data sources,” he writes.

A New Class of BI Applications

For his part, SAS’ Nguyen suggests that grid computing is a natural fit for data mining, along with specialized data warehousing applications. In addition, he points out, it can support a new generation of applications that exploit very large data sets in the hundreds of terabytes to multi-petabyte range.

“Data mining will probably be at the forefront of utilizing grid computing. However, any data intensive project will benefit from using grid,” he comments, citing a SAS customer—the National Institute of Environmental Health Sciences (NIEHS)—which implemented a grid that cut its execution time on research projects by as much as 95 percent.

Nguyen points out that retail and financial services customers who manage terabytes of historical data can exploit computational grids to inexpensively perform deep analysis on this data, searching for patterns, trends or other anomalies across years of information.

“What [an SAS customer that is a] financial institution and NIEHS are doing is looking at years and years of data. They’re collecting back to five years ago, trying to see if there are some trends, some anomalies, things like that,” he explains. “Most of these customers have terabytes of data, but I am anticipating that it will eventually escalate to petabytes. It’s just not practical to keep all of this [data] in a data warehouse.”

SAS recently became the first prominent BI vendor to join the Global Grid Forum, an industry advocacy group for grid computing. Nguyen believes that his company’s involvement is a sign of things to come. While grid computing isn’t a must-have complement to BI today, he allows, it will increasingly garner executive mindshare over the next two years. “I would encourage CIOs as well as CEOs to consider grid computing, if not in the next three to six months, probably out towards 12 to 18 months, due to the fact that data continues to grow, processing of your data continues to crunch time, and performance and scalability will be an issue.

"Grids aren’t an answer for everything, but they are an inexpensive alternative for many problems," Nguyen concludes.

Must Read Articles