Data Integration Woes

One of the most complex problems facing IT managers and developers today is data integration. Data integration across platforms, databases and applications is the 500-pound gorilla sitting in the middle of every corporate IT manager's computer room.

Our current data integration woes stem from the fact that most applications were developed in a data architecture vacuum. This arose because application developers usually don't think about other applications that might benefit from, or require access to, the data that they are creating or manipulating. Given that the potential universe of data-sharing candidate applications is almost infinite, this is not entirely unreasonable.

Another data dis-integration driver is heterogeneous system architectures. It's difficult to design for data sharing between one application that uses VSAM files on a mainframe and another one based on a relational database running on UNIX. As a result data is often manually copied and re-entered, leading to inconsistency and duplication across applications.

The good news is there are available solutions; the bad news is figuring out the best approach is not easy. What should you do when confronted with this issue? The first thing to consider is the level of data integration you want to implement. There are four types of data integration architectures to consider implementing.

One is message-oriented middleware (MOM). MOM, as the name implies, is used to send messages back and forth between systems, and these messages frequently contain data. CORBA, COM/DCOM and DCE are also used to send information between systems, but they typically invoke application code on remote systems, rather than send data. MOM is asynchronous, and APIs are readily available. The asynchronous aspect is helpful because it relieves the sending application from having to wait for an acknowledgment before continuing. The disadvantage of MOM, and some of the other technologies we're considering, is that it is invasive and requires access to source code to implement.

Another category is data replication. Companies such as Sybase, DataMirror Corp. (Markham, Ontario, and Praxis Int'l Inc. (Waltham, Mass., have tools that transfer data between systems at the database-to-database level. These tools are an excellent choice if you can't modify the application. You need three things to make replication work. One is a reliable network connection and access to the appropriate database interfaces. The second is access to the database schemas, so you can determine which data elements to replicate. The third is similar source and target data elements that are going to be replicated. It's difficult to copy data into an incompatible target field on a remote system without going through complex transformation logic.

The third category to consider is the integration engines. These are tools that provide multiple interfaces and support various protocols and APIs, ranging from RPC and SQL to screen scrapers. Vendors include Century Analysis Inc. (Pacheco, Calif.,, Software Technologies Corp. (Arcadia, Calif., and Constellar Corp. (Redwood Shores, Calif., The big advantage of these tools is their flexibility and their ability to pump large numbers of transactions. Also, for the most part, they are noninvasive, which means that they can be used in situations where direct access to the data or application source code is unavailable.

Finally, there are enterprise application integration vendors. Tools from companies such as New Era of Networks Inc. (NEON, Denver, and CrossWorlds Software Inc. (Burlingame, Calif., are creating a big stir in the industry, primarily because they enable logical integration between enterprise resource planning applications, such as those from SAP America Inc., the Baan Co. and PeopleSoft. Keep in mind that these application integration companies are fairly new and have been most successful in the ERP market, so they may not be suitable if you aren't deploying a packaged application that they support.

So, which approach should you select? First, look for a generalized, modular design, with the flexibility to accommodate current and future systems. Evaluate programmability and maintainability, since these systems will be in place for a long time. Verify that the tool you select is compatible with your network infrastructure. Make sure the vendor proves its system has sufficient scalability to meet your organization's throughput needs. And, finally, validate that the system architecture is robust and the implementation is reliable. If a failure occurs in a linkage between two mission-critical systems, you want assurances that there will be no loss of data, and that recovery time will be minimal. @@I--Robert Craig is director, Data Warehousing and Business Intelligence Division, at Hurwitz Group Inc. (Framingham, Mass.). -- Robert Craig is director, Data Warehousing and Business Intelligence Division, at Hurwitz Group Inc. (Newton, Mass.). Contact Robert at or via