In-Depth
CA XOsoft Delivers Continuity Confidence
There's no such thing as perfect disaster readiness, but you can still make important decisions to protect your data.
Whenever a pitch comes across my desk from a press relations agency claiming that they can provide access to a customer who has achieved “perfect security” or “perfect disaster readiness” using their client's product, I am instantly intrigued. Having boots on the ground after most major disasters of the past 20 years, whether as a journalist or as a consultant supporting one of my clients, I have seen nothing to suggest that any strategy leveraging any disaster recovery (DR) product can deliver perfect preparedness for events such as blackouts, fires, floods, earthquakes, tornados, or hurricanes.
The reason is simple: Disasters do not evolve according to a predefined script or scenario. The only real constant about disasters is their lack of predictability: disasters throw proverbial curve balls at your continuity plan. They stress recovery strategies to their breaking point and test the innovation and tenacity of your recovery team.
This explains why there are no gurus of disaster recovery, no masters of disaster. The collective view of those involved in a recovery effort, assuming their companies survive the test, is more often than not, “We were lucky.” Hardly guru speak.
Last week, when such a story pitch arrived, involving the University of Texas at Brownsville and Texas Southmost College (UTB/TSC) and their “readiness” for a repeat of a hurricane such as 2005’s Katrina or Rita, I had my doubts. When CA XOsoft was mentioned in connection with the claim, I confess that I was surprised by the hubris of the vendor’s marketeers. I decided to chat with the folks at UT Brownsville to see whether their view matched the PR spin.
After speaking with Doug Ferrier, Dean of UTB/TSC, and Brian Matthews, Computer User Services Specialist II from the UTB/TSC’s technical support organization, I found that they were more modest in their resiliency claims. As they explained their strategy, there wasn’t a hint of hype.
On a campus just 30 miles from the Gulf of Mexico, Ferrier explained, his group was keenly aware that it had dodged two major disasters in 2005—including Hurricane Rita, which played havoc with another college campus in Beaumont. The event sent Ferrier back to the campus disaster recovery plan, written several years ago, and an assessment of whether the existing plan would still work if needed to recover critical applications.
He wanted to enhance the protection of campus e-mail systems as a priority so that communications could be preserved even if everyone was evacuated from the campus by a storm or other calamity. Matthews noted that the existing plan, which relied heavily on tape backup and offsite storage, would not deliver the necessary recovery time objective (immediate) sought by the group. They wanted to replicate their 24,000 e-mail accounts at an Internet service provider in Austin, and simply failover to that location if continued operations at UTB/TSC’s computer room, less than 40 feet above sea level on the second floor of a campus building, became impossible.
Data from the existing e-mail systems, about 600 GB, were stored in a 12 TB Fibre Channel fabric, but duplicating the entire SAN at a remote location was deemed too expensive. They wanted to host the e-mail data remotely, then update it on an ongoing basis across a VLAN already in place between Brownsville and Austin that provided 20 MB of bandwidth. With only about 50 to 100 MB of change data per day, this seemed feasible.
Another criterion for the solution was that it needed to work with the server clustering configuration that Matthews and co-workers had deployed to host their Exchange Mail services, as well as the VMware virtualized server environment that comprised the failover environment at the Austin ISP. Exchange clustering, while improving the resiliency of Exchange Server in the face of server hardware problems, together with the attachment of servers to an FC SAN as the storage repository, introduced some potentially challenging technical issues. Furthermore, VMware at the recovery site presented some knotty problems from a remote replication standpoint. In short, they needed a simple way to failover from a physical clustered server setting to a virtualized server environment at the Austin ISP that wouldn’t be a pain to administer over time.
Searching for Solutions
The hunt was on for a solution to provide continuous data replication and failover. Among the solutions considered were EMC RepliStor, Double-Take Software’s Double-Take, The Neverfail Group’s The Neverfail (which was already in use to protect other important apps on the campus), and CA XOsoft.
Cost eliminated the EMC product early on, according to Ferrier and Matthews, DoubleTake was eliminated because “they wanted us to create additional resources inside clustered services, and we didn’t want to mess around with our physical Exchange Cluster environment.” Neverfail also required changes to the clustered environment that the planners did not want to make. “Our current infrastructure is working well. We wanted a solution that would not disrupt our day to day operations,” Matthews said.
Finally, he discovered CA XOsoft and its concept of geo-clustering. Matthews says that he downloaded the software from the Web to try it out. Installation was simple and it took only a short time to configure the product and to build a replication and failover scenario. “They have a lot of auto discovery built in, especially for a clustered Exchange Server environment, so they can quickly find the servers whose data you want to replicate and the servers where you want to replicate it to.”
By June, all of the products had been evaluated and only the CA XOsoft solution met their selection criteria. Testing continued in earnest through June and July 2006, and the solution went live at the end of summer.
“It has been an administrator’s dream,” Matthews said, “There have been several version updates that have improved the product since 2006, but none of them required any serious reconfiguration of our solution or re-thinking of our scenario for failover. Recently, they added the ability to integrate tape backup monitoring into the dashboard that oversees the replication process. With just two people managing e-mail and not a lot of time for overseeing disaster recovery processes, this product simply deploys and does what it says it will do with very little extra work.”
XOsoft was acquired by CA shortly before Ferrier and Matthews began their evaluation. Technically, the product is known as the CA XOsoft WANSyncHA Product Suite and is described by CA as a high-availability solution based on asynchronous real-time replication and automated failover and failback. It is especially tuned for Microsoft Exchange and SQL Server environments, Oracle RDMS, and a variety of Web server and file server environments.
After installation, WANSyncHA performs autonomously and intelligently behind the scenes. There is negligible impact on the daily work of servers and networks by the product during normal operation.
Ensuring Smooth Opertions
According to the vendor, considerable work has been done to ensure smooth operation of the product with both clustered and virtualized server environments. CA points out that VMware-based approaches for resiliency and failover work well within an infrastructure controlled by the virtualization tools, but more—a layered solution—is needed to provide failover capabilities that extend beyond the VMware paradigm. Data must be replicated separately behind apps such as Exchange Server and SQL Server to avoid a catastrophic loss of access to the data that drives clustered and virtualized servers themselves. Additionally, organizations need a cross-site failover strategy that VMware alone cannot deliver. This is where XOsoft functionality adds value.
WANSyncHA delivers data replication and application switchover between both like and unlike environments, essential to UTB/TSC’s requirements, where clustered servers at the primary site will failover to virtualized servers at the ISP, and data stored in an FC Fabric at home is replicated to internal server storage on the ISP’s rack servers.
The latter is a point that Matthews wanted to stress. “Imagine many users using Microsoft Outlook connected to a particular high-end, dedicated physical Exchange server utilizing yet another high-end Fiber-channel SAN infrastructure then suddenly they are failed over to the Austin site, which has internal storage only. What happens to the performance when they are now connected to a virtualized Exchange Server utilizing local storage and being accessed remotely? We found that the performance hit was not as bad as initially thought. Honestly speaking, no users have ever reported significant slow access times. Bottom line: You do not need a high end SAN at your remote site to do this kind of failover.”
Support for a broad range of WAN connectivity options is another way that WANSyncHA complements VMware, whose replication techniques are limited to the virtual domain created by VMware and usually only to replication within the same subnetwork.
Matthews is very confident that his CA XOsoft-based e-mail failover strategy will work if the time ever comes to use it. This is reinforced by the ability that the product provides to monitor the status of replication activity across the WAN, visibility that is often lacking in hardware-based storage replication schemes. Matthews notes that the strategy not only provides disaster recovery failover but plays an active role in Exchange Server maintenance activity, as well.
He said, “We can fail over any particular Exchange server to a VM in Austin and do maintenance, either software or hardware completely transparent to the user. Suddenly, they are composing e-mails from the Austin site. We definitely take advantage of this ability and do not wait for a disaster before we use it.”
He also noted that other key applications are not currently included in the CA XOsoft-based strategy. The university’s Blackboard application, for example, which will enable students to continue to take classes across the Web should campus facilities become unavailable, is hosted remotely on an application service based in Virginia. Also, Neverfail, a CA XOsoft competitor, is currently used to protect university Web servers under a pre-existing arrangement.
This underscores the point emphasized by many DR planners in companies: there is no one-size-fits-all strategy for disaster recovery and that most complex organizations will commonly feature a mixture of tape backup, continuous data protection, and application failover capabilities in the mix of methods used to protect their mission critical operations. One nice feature of CA XOsoft is that the developers continue to add third party software monitoring capabilities, beginning with a nice integration of CA ARCserve tape backup monitoring, to the product dashboard so that it might one day provide a single view for monitoring and even testing all of the different protection schemes that the company has deployed.
Bottom line: UTB/TSC is not daring the gods of weather to visit them with a disaster that will validate their strategy. They are making expedient and intelligent decisions that they hope will enable the university to continue its educational mission if and when an interrupting event occurs and they believe that CA XOsoft adds considerably to their continuity capability.
In our next column, we will look at another case study of disaster recovery solutions, focused on the special requirements imposed by the wildly popular VMware server virtualization paradigm. Until then, please share your VMware and disaster recovery experiences via e-mail: [email protected].