Moving Data

In previous columns I have discussed key issues related to populating a data warehouse and keeping it up to date. When defining the data movement and management process for your data warehouse, the following issues should be considered.

Know how much data you will need to move and the movement scenario that makes sense for your environment. Some tools are designed for bulk movement of large volumes of data, others excel at quick updates of small amounts of data. The former set would be optimal for batch loading a data warehouse, while the latter are better suited for real-time application integration.

Be sure that the tool you are going to use can handle the data sources you have in your IT organization. Some tools only work with relational databases, others have the ability to access a wide range of heterogeneous sources. These can include data from packaged applications -- such as SAP R/3, PeopleSoft, Baan, or J.D. Edwards -- mainframe data, or some of the more obscure legacy databases -- such as Adabas, MUMPS, Cincom, or COBOL-structure files -- that populate many useful production systems today. I remember a system integration project I worked on several years ago that required obtaining data from 12 different systems.

The third thing to look at is the richness of the data transformation functionality. In many cases you will need to consolidate data from multiple sources into a single record to go into the data warehouse. In other cases, you will need to decompose a single data record into multiple records for a target application. Ideally the tool you are looking at will have an abundant set of functions and capabilities that will give you several ways to parse, decompose, modify, and recombine the data. The software should also be extensible so you can add your own custom programming directly into the tool, using the vendor's SDK, or by calling out to external customized modules. This latter capability would be particularly important if data cleansing and quality checking is an important part of your data movement process. In many shops, users purchase specialized data quality tools that they use in conjunction with a data extraction and movement tool.

The development environment should have a well-designed graphical interface with drag-and-drop functionality, iconized representations of various operations, and the ability to string operations in an arbitrary sequence. For example, you should be able to have decision points with varying paths that are automatically selected, based on data values or other inputs.

Another consideration is a robust scheduling component. Many data warehouse loading operations have a finite production window. Often the various steps must be coordinated with other business operations. Make sure the products you evaluate have the ability to seamlessly tie into your production schedule.

Finally, check out support for capturing changes to source data. There are various methods for capturing changed data, and not all vendor products have that ability.

Despite recent consolidation, there are still many companies in the data warehouse space. It was interesting to review the vendor list I discussed last year, noting the absorption of Platinum Technology, Praxis Int’l, Red Brick, and Prism Solutions. The survivors include Acta Technology, Ardent Software, Carleton, Constellar, D2K, Data Junction, Data Mirror, ETI, Informatica, NEON, SAGA, and Tibco.

Given this tumultuous recent history, ask vendors how they can assure you that they'll still be around in a few years. It is, of course, impossible to predict the future, but you will be able to find out their reputation for service, reliability, and their financial strength. While the fact that a company has been acquired is not sufficient reason to eliminate them from consideration, keep in mind that the best people in the engineering and customer service units are often the first to bail out when an acquisition takes place, especially if it's a hostile one. If you are looking at a company that's been recently acquired, find out how good a job they have done retaining key people. -- Robert Craig is vice president of marketing at WebXi Inc. (Burlington, Mass.), and a former director at the Hurwitz Group Inc. Contact him at rcraig@webxi.com.