10 Simple Rules for Managing Next-Gen DI

TDWI Research's Philip Russom offers commonsense rules for handling the next generation of data integration.

In his latest Best Practices Report, Philip Russom, research director for data management (DM) with The Data Warehousing Institute (TDWI), outlines Ten Rules for Next Generation Data Integration.

As Russom's rule set makes clear, much of next-gen data integration (DI) is common sense. Even so, DI practitioners need to be wary of the occasional curve ball.

Russom's First Rule of Next-Gen integration says that DI isn't a single-bullet (or single-tool) fix. "Some data management professionals still think of DI as merely ETL tools for data warehousing or data replication utilities for database administration," he writes. "Those use cases are still prominent ... [y]et DI practices and tools have broadened into a dozen or more techniques and use cases."

Ditto for Russom's Second Rule -- namely, that effective next-gen DI will comprise both hand-coded and commercial offerings. In spite of the best-laid marketing strategies of DI vendors, Russom notes, a lot of DI legwork is still performed using hand-coded scripts, programmatic SQL, and other approaches. That's changing, however.

"TDWI survey data shows that migrating from hand coding to using a vendor DI tool is one of the strongest trends as organizations move into the next generation. A common best practice is to use a DI tool for most solutions, but augment it with hand coding for functions missing from the tool," he writes.

The first curve ball comes by way of Russom's Third Rule of Next-Gen DI: i.e., data integration will increasingly involve resources outside of a traditional DW -- or conventional data management (DM) -- context.

"DI is not just for data warehousing," he observes, adding that -- for the same reasons -- DI shouldn't be considered the province of operational DBAs, either. "[DI] now has many use cases spanning across many analytic and operational contexts, and expanding beyond DW and DBA work is one of the most prominent generational changes for DI."

Russom's Fourth Rule -- that DI is an autonomous discipline -- should be obvious to any practitioner of large-scale DI.

"Nowadays, there's so much DI work to be done that DI teams with 13 or more specialists are the norm; some teams have more than 100!" he writes "Due to this growth, a prominent generational decision is whether to staff and fund DI as is, or to set up an independent team or competency center for DI."

According to Russom's Fifth Rule, next-gen DI is "absorbing" complementary DM disciplines,a including data quality (DQ), master data management (MDM), replication, synchronization, event processing, and other practices. Perhaps because of its over-arching nature, DI has become a "broadly collaborative" discipline, according to Russom. That's his Sixth Rule. "The larger number of DI specialists requires local collaboration among DI team members, as well as global collaboration with other data management disciplines ... plus teams for message/service buses, database administration, and operational applications."

Just as there's no single-bullet tool for performing next-gen DI, there's likewise no single-bullet development methodology. That's the basis of Russom's Seventh Rule, which argues that a host of trends -- including an emphasis on larger DI teams, a distinction between operational and analytic DI projects, and an emphasis on interoperability with non-traditional DI disciplines -- have combined to recast the DI development status quo.

There's also the interface issue, which has long bedeviled DI practitioners. For next-gen DI, the interface issue is poised to become an order of magnitude more bedeviling. Enter Russom's Eighth Rule, which stresses the importance of complementing Olde DI technologies (such as ODBC, JDBC, FTP, APIs, and data bulk loaders) with more recent -- and less explicitly DI-related -- innovations, such as Web services, SOA, and data services.

"The new ones are critical to next generation requirements for real time and services," Russom writes. "Furthermore, as many organizations extend their DI infrastructure, DI interfaces need to access data on-premises, in public and private clouds, and at partner and customer sites."

Russom's Ninth Rule -- that DI must scale -- may not surprise DI practitioners, although (as he stresses) scaling next-gen DI might be easier said than done. "With volume and complexity exploding, scalability is a critical success factor for future generations. Make it a top priority in your plans," he urges.

Lest the advent of "agile" and other cutting-edge data integration methodologies muddy the issue, DI, now as ever, requires an architecture. That's Russom's Tenth and final Rule.

"It's true that some DI tools impose an architecture [usually hub and spoke], but DI developers still need to take control and design the details. DI architecture is important because it strongly enables or inhibits other next generation requirements for scalability, real time, high availability, server interoperability, and data services."