Selecting Your Data Replication Solution: Top 3 Criteria
Data replication plays an important part in risk management strategies.
by Bob Williamson
Many small- to mid-size businesses lack the resources necessary to employ a comprehensive disaster recovery plan. Perhaps they need the proper IT support, or the cost of the solutions themselves is out of the reach of their budget. Similarly, larger enterprises, with their complex IT infrastructures, frequently have incomplete solutions to handle disaster recovery. They may have deployment scenarios in which some server systems are backed up appropriately while others remain vulnerable because of compatibility issues.
Server downtime costs organizations worldwide billions of dollars each year. IDC estimates that server downtime costs organizations roughly $140 billion worldwide in lost worker productivity and revenue in 2007. For this reason alone, it is vital that IT professionals reduce or eliminate the negative impacts of planned and unplanned downtime. Disaster recovery implementation is fundamental to business-critical applications and data protection and, ultimately, to ensuring business continuity in companies of all sizes.
Last month, a Gartner report noted that one explanation for many businesses’ ineffective approach to risk management is that the definition of the term has become diluted. Consequently, risk practices are not tailored to specific enterprise needs. Data replication technology is an integral part of that definition and any comprehensive disaster recovery plan. To fully understand this technology, there are three key factors IT professionals should consider when assessing the best data replication solution for their companies. Careful consideration of these factors will provide the insight about how incorporating data replication into your security architecture will promote business continuity and simplify disaster recovery.
Because every business has a unique set of network requirements, determining the best replication technology for your enterprise can pose a formidable challenge. You can simplify this process by assessing three primary factors:
- the rate of change of data across your network
- your existing hardware and platform configurations
- your budget
Taken together, the first two factors help paint a picture of your network server environment, highlighting key characteristics that will guide you in your assessment. The third will dictate your selection and implementation process.
Systems frequently fail because the measured amount of data traveling across the network is incorrect. To prepare for disaster recovery, all applications and data must be replicated from the primary server to the (usually remote) backup server. Providing adequate network bandwidth for this data to quickly and successfully travel from one server location to another ensures that the most recent data possible is available during a recovery. If the network connection is too narrow, replication efforts suffer in one of two ways: either server performance degrades while the primary server waits for acknowledgements of remote data writes or data backs up in the network pipeline while traveling between locations. Both circumstances lead to adverse results -- you either have a slow primary server or a dated backup server.
To determine the best network bandwidth, measure current usage in two specific ways. First, gauge your average rate of change. This yields a standard hourly rate for data traveling across the network in your primary database. You can determine a reliable number by monitoring the data change rates over two or more weeks. The second calculation measures how much data spikes at the period of greatest activity during the same timeframe. This value offers you an approximate maximum value of data necessary for replication between the primary and backup server locations.
To find your optimal rate of change, select a value between your average rate and peak rate. Of course, anticipated growth in users and other IT infrastructure changes, as well as network quality and latency, may lead to an increased network burden and should be factored into the equation. This optimal value helps you better assess which method of replication, some transferring data faster than others, is most appropriate for your specific IT environment.
Hardware and Hard Dollars
You must also take into consideration existing hardware and platform configurations when you determine which type of replication technology is ideal for your company. Many organizations, particularly large enterprises, have heterogeneous server environments. For example, an enterprise may rely on an HP Proliant server running Microsoft Exchange for e-mail services, an IBM System x Linux-based server for database services, and a Dell server for providing network services. Similarly, a variety of storage platforms are likely to exist within the corporate infrastructure.
In these diverse environments, a data replication solution that is platform agnostic must be deployed. Host-based data replication best meets this requirement.
This type of replication technology generally relies on a filter driver, resident on both the source and target side of the replica (or mirror), which inserts into the I/O stream to manage the replication process.
There are two types of host-based data replication: volume-level (also called block replication) and file-level (also called byte-level replication). The faster volume-level, host-based replication is implemented below the file system and transports only on the blocks of data that have changed since the last replication, reducing the total amount of data being transferred across the network. For a larger enterprise (which tends to have a higher rate of change), this type of replication can significantly improve performance. Implementing such a host-based replication technology is one of best ways for a larger enterprise to ensure business continuity.
However, if you use this solution and need to replicate only certain directories or files, your data must be organized by volume. Volume replication also has the advantage of being simple to integrate with high-availability clustering products, such as Microsoft Windows Failover Clustering, and lays the foundation for building a full WAN-based disaster recovery environment.
A less-expensive option, file-level replication, allows you to more easily target specific files or directories. With some implementations, however, the entire file, regardless of its size, is replicated if any data within it changes. For a larger enterprise, this approach can have a significant impact on replication performance. Additionally, file-level replication can include file system overhead due to its position of its filter driver above the file system and its need to understand file system operations. File-level replication solutions often have trouble replicating continuously open files such as databases, requiring a separate piece of software to manage open file handles . For these reasons, file-level host-based replication technology is not well-suited to the larger enterprise.
For the small and mid-size enterprise, where the burden of data replication across the network tends to be lower and lower price points tend to drive investment, file-level, host-based replication may be preferable. An ideal solution is to implement the solution that is optimal for the specific data being replicated and may include a mixture of volume-based and file-based replication coexisting within the corporate infrastructure and, perhaps, even on the same server.
To fully leverage data replication technology in the most efficient and cost-effective manner possible, organizations both small and large must carefully consider their options. Upfront evaluation of the rate of data change across their networks, their existing hardware configurations and their budgets are invaluable in the selection process. By carefully analyzing each of these three critical factors, IT decision-making is simplified and you’ll find an affordable, yet comprehensive, disaster recovery for your business, not matter what its size.
- - -
Bob Williamson has over 10 years of experience delivering application and data protection solutions. He is currently the senior vice president of product management at SteelEye Technology. You can each the author at firstname.lastname@example.org.