In-Depth

In Praise of Data Replication

If consolidation activity and new product pushes are any indication, data replication is finally getting the respect it deserves.

A new report from TDWI explores the usefulness of one of the most under-appreciated tools in the data integration (DI) kit: data replication.

Almost half of all DI efforts involve replication, according to TDWI survey data. That makes it the second most popular -- or second most used -- data integration tool, behind ETL.

Comparatively few shops appreciate the usefulness -- the flexibility and potential applicability -- of data replication, argues Philip Russom, research director for data management at TDWI.

"Fully modern replication tools can be configured to operate many different ways -- ranging from real time to batch, from single database brands to broadly heterogeneous environments, from one to many databases, from small data sets to big data, and from unaltered copies of data to transformed data," writes Russom, author of Data Replication for Real-Time Data Warehousing and Analytics, part of TDWI's Checklist Report series.

"[R]eplication is straightforward to set up and maintain, it's less intrusive to source and target systems than most forms of data integration, and most data management professionals already have experience with it."

Thanks chiefly to its flexibility, data replication is experiencing a renaissance of sorts, writes Russom. "[R]eal-time [replication] configurations support fast-paced business practices, such as operational business intelligence and just-in-time inventory," he writes. "Replication can synchronize 360-degree views of customers and other business entities across heterogeneous applications. Data replicas are an important component of business continuity, and the use cases for replication span both operational and analytic applications."

Gaining the Respect it Deserves

One measure of replication's return to prominence is a surge in vendor interest.

Over the last half-decade, several replication specialists -- including DataMirror Corp., Golden Gate Software Inc., and Sybase Inc. -- were acquired by larger, non-specialty vendors.

That isn't all. As DataMirror, Golden Gate, and Sybase were absorbed or repackaged into new configurations, other vendors either stepped in.

Informatica Corp., for example, aggressively pushes Universal Replication, its own data replication toolkit, which it picked up last year in its acquisition of the former WisdomForce Technologies. Also last year, veteran change data capture (CDC) specialist Attunity Inc. unveiled Attunity Replicate, its first-ever foray into replication.

The very different approaches of both Informatica and Attunity help to illustrate the two-fold promise of data replication: it can be simple and straightforward or just the opposite: less simple and more orchestrated.

In the former case, Attunity officials claim that they're delivering classic -- or straightforward -- replication technology, with a few optimizations.

"We created the platform with the intention of creating high-performance data replication, but with quick time to value. We tried to automate a lot of the ... sometimes discrete activities or aspects of database replication," he points out, describing the activities of specifying source and target databases or of creating a target that mimics its source," Itamar Ankorion, vice president of business development and corporate strategy with Attunity, told BI This Week last November.

Unlike most classic tools, Attunity Replicate doesn't require an agent or of other database-side software. "Our bulk reader [has] all kinds of tunings and optimizations [for different databases]," he said. "It's geared toward bulk extract. We use Oracle's client with OCI and we have a lot of experience in terms of knowing which OCI APIs to use and how to optimize them. We work against the log and not against the database."

Attunity's been around for more than two decades. It made its name in CDC, a technology segment that shares many of the characteristics of data replication. Why did it decide -- in 2011, of all years -- to jump into the data replication market? Ankorion cites both the void created by market consolidation and a surge in demand for scalable replication technology. Besides, he points out, Attunity has plenty of experience in mainframe connectivity, and once Oracle acquired mainframe mainstay Golden Gate, Big Iron customers started looking elsewhere for alternatives.

"Oracle's acquisition of Golden Gate was kind of like what happened with Informatica's acquisition of [mainframe data connectivity specialist] Striva: it became a challenge for anyone who used to work with Golden Gate because of concerns about what [Oracle] was going to do [with the technology]," said Ankorion. "We see a lot of IBM and Oracle. Both of those products are very complex and are very expensive. That's the problem, from [a customer's] point of view: there are very few solutions and they tend to be very expensive and very complex."

Informatica, conversely, largely eschews the classic approach to data replication: on the one hand, it positions replication as a complement to its passel of integration technologies, at the center of which is PowerCenter ETL. While this seems straightforward, Informatica likewise proposes to enrich classic data replication -- i.e., the straightforward movement or synchronization of data between repositories -- with complementary technologies, such as data masking. While a vendor such as Attunity might pitch CDC or replication as an enabling technology for complex event processing (CEP) or information life cycle management (ILM), Informatica proposes to sell you CEP or ILM.

It's in this last regard that Informatica officials seem most enthusiastic about replication. "It has an almost staggering number of applications," said Scott Fingerhut, senior director of product marketing with Informatica's CEP practice, in a February interview. He cites ongoing enterprise transformation efforts involving real-time/right-time information access and CEP. "As [customers] begin eventifying [their IT operations], they're not just going to look at [ETL], they're going to look at tools [such as] replication," he continued, adding that data replication's strongest selling point might be its comparative laissez-faire-ness.

Informatica's Universal Replication, for example, uses a log-based CDC approach to replicate data between systems. As a result, Fingerhut claimed, it's "minimally disruptive" to source data systems. "[T]he conversation that [we can] have with the customer is instead of being intrusive to the application or the database, we can watch for changes and push the changes out. We can eventify [those changes]," he said.

This enriched approach to replication is likewise a component of Informatica's "lean" data management strategy. "What we mean by 'lean' is making sure that you're not keeping lots of data that people are not querying on your most precious resources," Adam Wilson, general manager for Informatica's Information Lifecycle Management practice, told BI This Week late last year. Unused or infrequently accessed data impacts service levels, complicates (or protracts) the replication process, and (most saliently, from the perspective of users) depresses query performance, Wilson observed. "Lean" data management, ala Informatica, involves making smarter decisions about how, why, and when data is extracted, replicated, transformed, or archived.

"At the core, you've got too much dormant data in these systems: closed transactions that aren't delivering value other than for compliance purposes, or -- in the context of the data warehouse -- transaction-level details that nobody is looking at anymore because they're just looking at the aggregates. You still want that information online and available, but you don't want to run it in the context of your most expensive [data management] resources," he said.

[Editor's note: Russom's report examines use cases of data replication. You can download it -- for free with registration -- from TDWI.]

Must Read Articles