Dashing to Dashboards for Business Continuity
If you need to augment your continuity strategy testing and management capability, RecoverGuard is worth a look.
One of the challenges of contemporary disaster recovery/business continuity is getting a handle on the various data protection processes in your organization. It is not uncommon to find multiple storage arrays from different vendors, each supporting an application (or set of applications), and each using its own behind-the-array replication process to mirror data to another storage array at a backup data center. Add application-level data replication processes from Oracle, Microsoft, and others, and the problem for the DR/BC planner is like herding cats.
Traditionally, the only way to ensure that the correct data was being replicated was to break each mirroring process after stopping applications and verifying that the data you thought you were copying is actually what was being copied: you needed to do a file comparison. But, what if you had applications that you couldn't disrupt?
Ed Goldberg, business continuity/disaster recovery coordinator for Northeast Utilities, New England's largest power company, located in Hartford, CT, confronted just such a problem. According to Goldberg, the company's concerns for business continuity predated 9/11, and so did its testing problem.
"We have some legacy systems that can't tolerate downtime. So, once a year we would test and we would always end up with some applications whose recoverability we couldn't verify through testing," Goldberg reported.
With that issue in mind, Goldberg was very interested when newcomer Continuity Software approached him last September to explain the value proposition of their pre-release data protection monitoring software, RecoverGuard.
"We had limited confidence in our tabletop testing exercises as a means for validating the recovery strategies we had developed for those business critical applications that we couldn't interrupt. We tried Continuity Software's product basically to bolster our confidence."
There was some pushback on the idea of "automating" the monitoring of processes for data protection, he noted. No one had the time to run reports, do nightly scans, or even to tackle the complexities of installing and configuring a product such as RecoverGuard. That's when Goldberg decided to "call their bluff." He asked the company, ""If it is so easy to install the product, why don't you do it?"
To his surprise, Continuity Software accepted the challenge. Goldberg says that installation and configuration of the server required less than a week, with engineers working on-site to collect information on application-hosting configurations and the utility's storage infrastructure, and working with Goldberg and other planners to map recovery-time objectives related to the applications themselves.
RecoverGuard is programmed with these details to enable the monitoring of state consistency between the production environment and the recovery environment and, if desired, to evaluate whether recovery time objectives can be met on an ongoing basis. According to Brian Schwarzentruber, a systems engineer who supports the installation of RecoverGuard for clients and customers, "The production environment changes daily, but most companies don't update their DR plan daily. Testing is the only way to find out what has changed and to adapt the plan to account for them."
Schwarzentruber says that there are three categories of "gaps" that can be identified by RecoverGuard. The first is data replication gaps. He says that by querying storage at the recovery site and comparing it to the production site, his product can identify differences in "data states" that consumers need to know about prior to any disaster.
RecoverGuard also looks at how LUNs are configured and mapped to hosts. Analysis often reveals gaps in the mapping of LUNs in the recovery platform that will delay business recovery until they are addressed.
The third category of gap analysis focuses on databases. Schwarzentruber notes that understanding how databases are configured in the file system and what storage volumes they use are critical pieces of information for ensuring successful recovery at an alternate location.
"Conventional testing," he notes, "often misses these gaps because testing procedures are orderly. We offer a proactive view that can spot gaps that exist in day-to-day preparedness."RecoverGuard is built around a ticketing system for service delivery. A ticket is generated when there is a best-practices or strategy "mismatch," he says, and a customer-configurable dashboard is provided that can provide a graphical topology of the error so it can be corrected. Many companies, including Northeast Utilities, elect to map recovery processes to infrastructure, while others build maps to show affected lines of business.
Customizing the product to specific customer requirements can take some time. Often, Schwarzentruber notes, companies don't have a clear or comprehensive idea of how data is being replicated or how storage is being provided to applications -- especially databases. Up front analysis is required to get RecoverGuard deployed and operating to deliver its full potential value.
That's the heavy lifting that Ed Goldberg wanted to avoid. Within a week, RecoverGuard was deployed. Thirty days later, he said, the reports and alerts from the product had been refined to exclude issues "that we already knew about or didn't care about."
He pointed out that RecoverGuard is only used to monitor the recoverability of his open systems platforms and data replication schemes between his data centers north and south of Hartford. Calls are held weekly with the Continuity Software team that includes, to his surprise, personnel from both Boston and Israel, where the company was founded. "Despite the seven- or eight-hour time difference, the Israeli engineers always show up," he said.
For now, he is content to have the close interaction with the development team and prefers to have RecoverGuard operated as a service rather than as another platform he needs to manage. He believes that, once the reporting engine is "trained" to show the customer what he is most concerned about, it delivers considerable value for planning and preparedness.
If you are looking for an augment to your continuity strategy testing and management capability, RecoverGuard is worth a look. Your comments are welcome: firstname.lastname@example.org.
Jon William Toigo is chairman of The Data Management Institute, the CEO of data management consulting and research firm Toigo Partners International, as well as a contributing editor to Enterprise Systems and its Storage Strategies columnist. Mr. Toigo is the author of 14 books, including Disaster Recovery Planning, 3rd Edition, and The Holy Grail of Network Storage Management, both from Prentice Hall.