Wrappers and Recovery
How prepared are you for a disaster?
Almost a year ago I wrote about the University of Texas and its use of CA XOsoft to enable the continuity of its e-mail systems in the wake of an interruption event such as a hurricane (see http://esj.com/storage/article.aspx?EditorialsID=2819). In part, the University's interest in such a solution was driven by the experience of Hurricane Rita several years ago.
Recently, the people at UT needed to execute the strategy as heavy gales and rain associated with Hurricane Dolly began to slam the campus in Brownsville. The XOsoft solution performed, in a word, flawlessly. Systems specialists Alejandro Herrera and Brian Matthews of The University of Texas at Brownsville/Texas Southmost College (UTBSC) told the tale.
According to Matthews, the business continuity plan at UTBSC had built in triggers for activating "hurricane mode" as the Category 1 storm approached the Texas coastline. On Tuesday, July 22, once summer session classes adjourned at 5 PM, the word came down from school officials to go "IT Black" -- that is, to shut down the IT department's 104 servers and Fibre Channel fabric and to unplug everything, including battery backup and UPS systems, from the wall in preparation for a storm that was still, by best guess, 48 hours away.
The process involved failing over Brownsville's e-mail cluster and SharePoint server (with SQL Server back end) over to a virtual hosting environment in the rack of an ISP in Austin. This failover scenario, recalled Herrera, had been the tipping point in the selection of CA XOsoft over competitors that included NeverFail Group, Double-Take Software, and EMC RepliStor nearly a year before.
"We have a physical cluster for e-mail connected to a SAN," said Herrera. "The other vendors we evaluated required that we break our local cluster for their product to work for failover. We didn't want to disrupt our primary environment. CA XOsoft let us failover to [unlike] platforms without having to make any major infrastructure changes."
Matthews said that the failover was almost routine, noting that he fails over e-mail and SharePoint, which is used to host the university's Web site whenever maintenance is needed on the hardware or software in the production shop. "We do it fairly often," he said, noting that the solution supports all 31,000 mailboxes without difficulty, even though fewer users were working with e-mail during the summer program at the school during this hurricane event.
By the time Hurricane Dolly, now a Category 2 storm with top winds of 100 mph, made landfall at South Padre Island on July 23, the university data center was dark and locked down. A few days later, when the winds and rain had died down, the failback process began. Matthews noted that failback is a bit more protracted than failover.
"With failover, the data is already synchronized to the recovery site. With failback, first you must start the storage in the SAN, then restart Active Directory, then restart all of the servers in the proper sequence -- but first you need to start replication going from Austin to Brownsville to resynchronize the data between the sites. That takes about six to eight hours."
Once re-synchronized, the failback process occurred without incident. Thanks to CA XOsoft, e-mail and Web services that support students and faculty at UT's southern campuses were never interrupted.
CA XOsoft is one of the leaders in an increasingly crowded space of replication engines with failover capabilities -- what I call infrastructure failover wrappers. UT uses their product for e-mail and SharePoint/Web site high availability, while using competitor Neverfail Group's Neverfail Continuous Availability Suite for other applications. Many other organizations have turned to products such as Double-Take Software's Double-Take and EMC RepliStor for similar replication and failover clustering capabilities.
Even VMware is getting into the game with Site Recovery Manager; it provides no data replication services (the underlying storage hardware products need to provide those services) but claims to enable failover across a WAN. The latter may become more desperately needed if VMware software patching snafus -- such as the one last week that shut down VMware clusters running the company's flagship ESX 3.5 software -- become more commonplace.
An interesting newcomer in this space is Asempra Technologies. Targeting midrange companies with its Asempra Business Continuity Server, and providing recovery services for Microsoft servers and applications, the vendor has found what it thinks is a big differentiator in the wrapper market -- the ability to restart applications while data is still being re-synchronized.
According to spokespeople for Asempra, companies need immediate access to Windows-based applications to avoid costly and disruptive downtime. The Asempra BCS enables companies to send and receive e-mail or process SQL transactions while the complete data set is being recovered in the background.
Asempra introduced me to their customer, Vivek Vasudeva, vice president of product development and operations for the firm Quality Planning, Inc. A division of Insurance Services.Org (ISO), Quality Planning provides validation services for information used by automobile insurers to rate policies based on risk assessment.
Vasudeva noted that his company's services are data intensive. Policyholders are contacted directly to learn about driving patterns and to collect other information that is blended with analytical models and third-party reports to provide the insurers with more accurate data upon which to assess insurance policy rates. The data sets created in this process are massive, in the low double-digit TB, according to the VP.
"We decided 18 months ago that we needed to make this data as secure and available in near-real time as possible," Vasudeva reported. He said that he evaluated both Asempra and EMC, but, in the end, "there was a huge value-for-cost differential that favored the Asempra bid."
His implementation story, however, did not illustrate the importance that Asempra was placing on rapid recovery of partially loaded data. According to Vasudeva, the Asempra solution was first deployed without disaster recovery in mind, to provide a local form of data protection through continuous data replication. That way, if a file is accidentally deleted, it can be reloaded rapidly from an Asempra copy.
"Asempra provides the first layer of data protection, but we are now considering adding it to our DR strategy so we can replicate the Asempra data layer across the WAN to a recovery site rather than using tape and offsite storage," he said.
He reported that Asempra's pitch, "six guys and a very professional 40-page report," was matched by an extremely helpful implementation support effort. "Implementing the product was not without its challenges, there are always a few kinks. The biggest problem was that the implementers needed to cancel lunch so that the product could be fully implemented by the end of the day."
As these stories illustrate, wrapper software, whether CA XOsoft or Asempra BCS, is coming into greater use by organization seeking to rationalize their disaster recovery plans with increasingly "always-on" demands of contemporary business. Each product has nuances that you must fully understood before you make a buying decision, however, and try-before-you-buy is the watchword on wrappers today.
Your comments are welcome: firstname.lastname@example.org.