In-Depth
DataFlux 7.0 Introduces Single Platform for Data Quality
DataFlux adds support for real-time, heterogeneous connectivity to its market-leading data quality tool
This week, DataFlux Corp. announced a major update to its popular data-quality suite.
Version 7.0 of the DataFlux Data Quality Integration Solution—or DataFlux 7.0, for short—debuts at a time when the data-quality market is itself in a sort of flux. Data cleansing as a technology has become increasingly commoditized—Microsoft promises limited support for data cleansing (via Analysis Services) in its forthcoming SQL Server 2005 database—even as features such as data profiling, data enrichment, and data monitoring have come to the fore.
The revamped DataFlux 7.0 addresses all of these requirements. At the same time, officials claim it also anticipates an incipient trend in data-quality products—namely, the demand for real-time cleansing, profiling, enrichment, and monitoring of data from disparate sources.
This is a lot more complicated than most traditional data quality strategies, claims DataFlux president and general manager Tony Fisher. In most cases, he says, “doing” data quality involves cleansing data as it’s extracted—often as part of a batch process—from one system and loaded into another.
“Really what we’re trying to do with Version 7 is provide a single platform for data quality as organizations implement data integration deployments. So we’re building data quality into data integration,” he observes.
Call it a case of good timing, what with IBM Corp.’s $1 billion acquisition of data integration specialist Ascential Software Corp. less than two months gone. Truth is, Fisher says, IBM is on to something: data integration is a big business, and data integration involves more than just ETL.
“There are real problems that are pushing organizations to integrate across their organization. One of the big ones is compliance, but another big driver that we see is customer data integration, and there are a bunch of others,” he explains. “Regardless of why they’re [doing data integration], they need a way to take care of the data to ensure a consistent and reliable version of the data is moved forward, and that’s what DataFlux 7.0 is all about.”
DataFlux 7.0 is based on a new service-oriented architecture (SOA), which Fisher says users can configure to connect to just about any data source. It ships with a revamped point-and-click design environment (DataFlux dfPower Studio), as well as a new component, the DataFlux Integration Server. BI pros can use dfPower Studio to design data workflows and establish data-quality rules. The new DataFlux Integration Server supports batch, on-demand, and real-time data quality processes. It functions as a hub for data-quality rules.
“We’re providing logic mechanisms that can be invoked and operated in any different runtime environment. The interface that we have is primarily a business-user interface. The end-user interface is targeted toward business users. The business users know what the data is supposed to look like,” Fisher says. “It allows the business user to go in and define the workflow, define the rules that are required to maintain the data that you need.”
DataFlux is a SAS subsidiary, and SAS offers a market-leading ETL tool of its own. In this respect, Fisher acknowledges, some of the DataFlux data-quality technology has found its way into the SAS Enterprise ETL tool, as well as industry-specific SAS solutions, such as the company’s anti-money-laundering software. “There is a lot of cross-germination between DataFlux and SAS, especially from a technology perspective. There are tech components that come out of SAS and go into the DataFlux, and a lot of the underlying technology that goes into DataFlux surfaces in SAS as well.”
But DataFlux is a market-leading data-quality solution in its own right. Is Fisher concerned that the close technology-sharing relationship could leave DataFlux vulnerable to charges that it’s an SAS-only play? Not in the least, he says. “That won’t prevent us from making our primary lead SAS,” he notes. “It certainly is true that there are situations where it is sometimes difficult for DataFlux to partner [with a SAS competitor]. Has it been a huge problem? Not very often it hasn’t. We will continue to support all of these other vendor environments as well.”
Rob Lerner, a principal analyst for data warehousing and application infrastructure with consultancy Current Analysis Inc., says the DataFlux relationship with SAS has helped rather than hindered the company. “When SAS acquired them four or five years ago, they were a small player. But SAS has really helped them to grow to the point where they’re one of the market leaders,” he says. “DataFlux does have a lot of autonomy and it’s able to do a lot of things. Plus, SAS has a lot of great technology. It has a lot of respect in its marketplaces, so I don’t see [its relationship with SAS being] a drawback for DataFlux at all.”
More to the point, Lerner says, there are few extant pure data-quality products that address the range of data-quality issues covered by DataFlux. Last year, for example, Trillium was acquired by Harte-Hanks, while Pitney-Bowes gobbled up Group 1. That leaves Innovative Systems Inc. and FirstLogic Corp. as the most prominent data-quality pure plays.
As for DataFlux 7.0 and its integration-centric packaging, Lerner thinks that’s the direction in which the data-quality market as a whole is heading.
“Data quality is not just about data cleansing. In order to make a data-quality routine more effective, you start by understanding your data with data profiling, and by understanding your data, you can attack the problem more efficiently, and add these other features, like enrichment and monitoring, on top of that,” he concludes. “What they’re really doing is they’ve enhanced their SOA and Web-services capabilities, and so they’ve created a solution that you can easily push out data-quality rules and routines and workflows to all enterprises, applications, and whatever, and this is across the extended enterprise.”
About the Author
Stephen Swoyer is a Nashville, TN-based freelance journalist who writes about technology.