In-Depth

Data Integration Specialist Aims to Upset DW Status Quo

Very large data warehouse vendor Compact Solutions will soon release an ETL testing and design tool that works across many environments.

Compact Solutions is a mere youngster by business intelligence standards. Founded just over five years ago, it has begun to aggressively market its wares (both human and technological) in the last year. Its ambitions are far from modest.

The company is pitching a full-blown very large data warehouse (VLDW) architecture, which it markets as a "Data Warehouse Without an RDBMS." Data warehousing on a Very Large scale calls for an explicit VLDW architecture, officials say, citing a supporting study from consultancy Forrester Research. Among other points of differentiation, the Data Warehouse Without an RDBMS uses compressed data stored in flat files (and not in an RDBMS) as its effective staging point, hence the name.

That's just the beginning. In the next month or two, Compact Solutions plans to deliver its Compact Data Integration and Migration Console CDIMC, a one-stop shop for ETL testing and design. Most ETL design tools are vendor specific: Informatica tools support PowerCenter, IBM tools is for DataStage, SAP/Business Objects tools work with Data Integrator, and so on.

The CDIMC is a single console that can generate ETL code for PowerCenter, DataStage, and Ab Initio; as well as ANSI or PL SQL for Teradata and Oracle environments. It can also be tapped as a tool to accelerate migrations from one/many ETL environments to a single consolidated standard (or vice versa).

Pankaj Agrawal, Compact Solutions' CEO and CTO, argues that the timing is propitious for outside players that can approach both traditional and new DW problems from fresh perspectives, nothing that customers must disabuse themselves of the notion that they can simply buy a data warehouse. That's what Compact Solutions aims to do.

"What we feel is missing a lot of times is the realization that the data warehouse needs to be architected and engineered, and that [this architecting and engineering] requires effort," Agrawal observes. "People think that if they buy Informatica or DataStage that using that, you'll have a good data warehouse, but these [products] are only tools. It's like crafting fine furniture: you can't just use power tools to build furniture. If you do, you end up with something crude. Instead, you use those power tools to actually craft that furniture," he adds.

Owing to a host of vagaries (and the vicissitudes of topology and deployment), developing a data warehouse is rarely an out-of-the-box proposition.

"I think people do underestimate the need to properly architect and design a warehouse and how important it is to pay attention to data quality how important it is to create mechanisms in their ETL processes to address those data quality issues up front," Agrawal comments. "If you just use an ETL tool to take the data from your transactional system and dump it into a database, it doesn't automatically give you a data warehouse. If you didn't clean your data, integrate your data, replicate it properly in a fashion that will make it quicker and easier for the end users to analyze, you don't have a functional [data warehouse]. All of that requires a lot of investment beyond just the tools or the software."

More frequently, Agrawal says, shops suffer from a profusion of DI tools -- starting, of course, with ETL scripts or programmatic SQL. Competing DI tooling (or legacy holdouts) are another reason why DW projects fail to approach -- much less realize -- their potential.

"Almost every company that we have been working with has more than one [DI] technology in place. For example, they have a lot of legacy code. They don't migrate [it] generally because of the cost of migration. They realize that if they can migrate and consolidate on one technology, it will be easier and cheaper for them to manage. Clients are hesitant to get into migration just because it is a manual process. It takes a lot of time and money to do that, so these tools continue to proliferate," he comments.

Compact Solutions will soon have a solution (CDIMC), but Agrawal describes DW standardization as an unambiguously desirable end -- the kind of thing (especially in the present environment) that sells itself. "The few prospects we are taking to, they like this concept. They like it as a means to reduce the licensing costs for their standard ETL platform. One segment where we think it will have a lot of play is in small and medium-sized companies. They don't have any tool in place. This editor gives them a way to create and maintain those designs [via programmatic SQL] in a graphical environment and use their RDBMS [platform] do the ETL," he concludes.M

Must Read Articles