SAS’ Grid Gambit
Are computational grids still a technology solution in search of a market? Far from it, SAS officials claim.
SAS Institute Inc. recently announced a new version of its market-leading Enterprise ETL Server. Along with improved ease-of-use and change management features, the revamped SAS Enterprise ETL Server boasts new dynamic grid computing capabilities, which SAS officials say should help improve performance. But Enterprise ETL Server isn’t the only grid-enabled entry in SAS’ product line-up: The Cary, N.C.-based BI giant also touted new dynamic grid capabilities for its market-leading Enterprise Miner tool, too.
SAS, Informatica Corp. and IBM Corp. have long been regarded as the three ETL market leaders, with Informatica seen as the undisputed ETL market champ. (http://www.esj.com/business_intelligence/article.aspx?EditorialsID=7201)
But the ETL market has changed drastically over the last few months, starting with the acquisition of the former Ascential Software Corp. by IBM Corp. earlier this year. Ascential’s ETL, data quality, and data profiling technologies round out Big Blue’s federated information access stack and arguably make Armonk the player to beat in the data integration space.
Big Blue’s acquisition also helped shine a light on SAS’ own ETL prowess. After all, SAS boasts a best-of-breed data quality solution of its own, thanks to its DataFlux subsidiary, along with robust parallel processing capabilities and connectivity into about as many sources as its primary competitors. And while SAS probably isn’t the first name that comes to mind when you’re shopping for a platform-independent ETL solution, its extensive market reach (SAS is a player in nearly every vertical, with significant strength in the financial services and life sciences industries), coupled with its more than $2 billion in annual revenues and its dedicated user base, make it a compelling option in environments that use the company’s data mining or statistical analysis software. And in organizations that tap SAS9 as an all-in-one BI suite, SAS’ ETL tool is all but a lead pipe cinch.
In its newest iteration, SAS Enterprise ETL Server promises improved ease of use, more tolerance of change, and—thanks to its dynamic grid capabilities --better performance, too. “We certainly believe that… a huge differentiator for us is ease of use, in addition to ease of management and maintainability,” says Eric Hunley, a product marketing manager with SAS.
For example, says Hunley, the revamped Enterprise ETL Server incorporates new change management capabilities that let ETL designers model proposed changes to identify and discover the impact that they’ll have on ETL jobs. In addition, ETL designers can control how metadata changes are applied to their projects, which Hunley says should help improve overall efficiency.
“One thing that change analysis allows you to do is look at information and metadata and be able to easily track and determine what changes you’ve got versus what’s coming in,” he comments. “This helps [ETL designers] actually see the impact that those changes will have on their current environments, so they’ve got a good idea of how much work [are these changes] going to mean for [them], or is it going to be a short-term project or a long-term project.”
While it’s probably much too early to speak of self-service with respect to ETL job design, Informatica, Ascential, SAS, and other vendors have incorporated quasi-self-service capabilities into their design toolsets, in the form of wizard-driven configuration tools and other ease-of-use features.
SAS’ revamped ETL tool, for example, can facilitate wizard-driven connectivity to data sources, which could make it possible for power users with little or no ETL programming experience to build simple ETL jobs. “In terms of connectivity to data, both source and target, it is a wizard-driven interface, so there’s a Source Designer wizard that allows you to go through the UI and basically select from a list a wide variety of data sources,” Hunley says. Ditto for metadata management. “Is it an Oracle table, is it a Teradata table, is it SAS data, or is it a flat file? Whether it’s structured or unstructured information, we walk you through the process of basically defining that metadata, so you don’t have to be a database expert or even really an ETL expert to” use the tool.
Have Grid, Will ETL
Elsewhere, SAS expanded its longtime partnership with grid computing powerhouse Platform Computing to grid-equip both Enterprise ETL Server and Enterprise Miner. Grids have often been perceived as a technology solution in search of a market, but—in the case of both ETL and data mining workloads—SAS officials say there’s immediate applicability.
Mike Schiff, a principal with BI and data warehousing consultancy MAS Strategies, says there’s some truth to this claim. ETL jobs that involve complex workflows or sophisticated transforms can almost certainly benefit from grid’s extra processing muscle. At the same time, Schiff points out, grid-enabled ETL raises many of the same issues as its putative technological cousin, parallel processing. “Grid is a lot like parallel processing across multiple computers, and [as with parallel processing] it’s tough coordinating all of the jobs – it’s not something that’s trivial, determining how best to parallelize it, what pieces to break up. But grid makes it even more complicated because you can have multiple installations [on non-standard compute resources],” he says.
To a large degree, grids are a refinement (or evolution) of the parallel processing capabilities most high-end ETL tools already possess, agrees Cheryl Doninger, director of research and development with SAS. But while parallel processing typically describes the process of spanning a single, intact workload over multiple, independent processor resources (as in a massively parallel supercomputer, for example), grids are different. Computational grids let organizations yoke together tens, hundreds, or even thousands of distributed computers into large, number-crunching grids. But instead of a single, contiguous workload that’s apportioned across multiple, independent compute resources, grids describe an architecture in which a workload is broken up into pieces or chunks and apportioned across multiple, independent compute resources.
“Grid picks up and takes off where parallel processing leaves off. We’ve had a variety of customers getting a tremendous benefit from the parallel processing capabilities that we’ve had for years, and [grid] will build on this,” Doninger says. “There may be situations where you need [to have] many independent paths that operate against different subsets of data [as in a grid], or it could also be the case that you need many independent paths that are processing a single input data source [as in parallel processing].”
From the perspective of users of its ETL and data mining tools, says Doninger, SAS’ new dynamic grid capabilities are automatic; no programming is required to grid-enable jobs or applications. Instead, SAS’ design tools produce code that’s grid-ready, to the extent that it can dynamically provision pieces of a workload to compute resources in a grid. “We now have a development environment in the SAS product set that will generate SAS programs that are automatically enabled for grids,” she explains. “You can now use the GUI design environment of our ETL tool to create apps that are automatically capable of leveraging a grid.”
On the ETL front, Doninger says grids could be a boon to risk management and data quality processing, along with other compute-intensive algorithmic processes. “We are targeting more of our SAS solutions that have workflows that lend themselves to acceleration through a grid environment. So [that includes] our risk-management products, because they’re very data-intensive; they’re very compute-intensive,” she comments. “The same is absolutely the case with custom [data quality] transforms, because a custom transform ends up being a fast application that’s written in the SAS [fourth generation programming] 4GL language, in the SAS syntax.”
MAS Strategies’ Schiff says there’s an element of what you might call keeping-up-with-the-Ellisons at work here, too. “I don’t think it’s quite as revolutionary as they’re trying to make it sound,” he concludes. “The real issue is that Oracle’s trying to run away with grids. They’ve branded themselves as the grid company [in the minds of many BI consumers]—more even than IBM. So there’s sort of an element of ‘you don’t let Oracle steal the message’ here.”
Stephen Swoyer is a Nashville, TN-based freelance journalist who writes about technology.