Overcoming Disaster Recovery Resistance

In this Internet era, there is no tolerance for systems outages that last more than a few hours.

In this Internet era, there is no tolerance for systems outages that last more than a few hours. Specialists from IBM Business Recovery Services (BRS, www.brs.ibm.com ), however, recently disclosed that the best recovery time most NT shops can expect after a disaster is 48 hours—assuming the correct replacement equipment arrives on time, the backup media is consistent and reliable and there are no major issues with the versions of applications being restored.

One of the biggest difficulties may be recovering a large chunk of missing transaction data from just before a crash. Most systems rely on backup data from the night before. "If you are already in the middle of a disaster, what makes you think you will get lucky?" asked Timothy Ging, manager with IBM BRS, at a recent summit hosted by IBM. "Everyone will be asking you, ‘how long, how long?’"

There is a disturbing lack of planning for disasters at Windows NT sites, recovery experts agree. But disaster planning is not as cut and dry as it was for centralized computing environments. For example, prioritization becomes tricky, particularly since every user groups sees its data as the most important. Add to this the fact that most organizations run different applications on different Windows NT servers, says Donna Scott, vice president of software infrastructure at GartnerGroup (www.gartner.com).

"Each of those applications have different priorities from a recovery perspective," Scott says. "You need to make sure that you mount the right tapes to recover the applications in sequence. If you have hundreds of servers, which many large organizations have, then the sequence recovery and the prioritization of recovery becomes important."

This is one reason why IBM BRS recently unveiled new services targeted at Windows NT and PC server sites that range from data mirroring facilities to mobile offices that can be trucked in and set up on short notice. IBM BRS’ mobile recovery center can provide 200 PC workstation seats—with WAN capabilities—in less than 48 hours, Ging says. For 400 users, the system can be up and running with in 72 hours. A mobile center with 1,000 seats can be built within a week, he adds.

Only recently were PC LANs recognized as essential to ongoing operations by larger companies. Smaller companies, meanwhile, never gave a thought to disaster recovery. Windows NT systems are the "last bastion of disaster recovery," says Susan Wagner, a specialist with IBM BRS.

The costs of overlooking Windows systems recovery are more extensive than just a few years ago. One IBM BRS client determined that it would face $50,000-a-day losses if its 50 sales agents were unable to perform their jobs. The solution the company arrived at was to store critical PC server data offsite on a mirrored PC server at a recovery center. The estimated cost incurred was $5,000 a month to have the company up and running the same day, relates Wagner.

These systems bring unique issues that IT planners have never had to grapple with before, Wagner says. "With client/server, the end-user becomes more critical," driving the focus down to the business unit level, she points out. But only about half of the business units in companies Wagner has worked with have data recovery plans.

Without such planning, there is no way IT managers can guarantee recovery times of less than 48 hours, Ging notes. IT planners need to sit down with user groups and draw up a business impact assessment (BIA) to get an idea of which users and systems are critical to the immediate functioning of the company. "It’s tough to recover and restore 10,000 users at once," Wagner states.

If a user group will not sit down and draw up a BIA plan, "all you can tell users is that it may take more than 48 hours to recover," Ging says.

Currency of data is another issue. "Lost data is becoming more and more unacceptable," Ging says. "Recovering to the point of the last back-up is becoming an unacceptable option." With e-commerce and other initiatives taking place, "we’re driving toward a one-minute recovery time window."

For PC LAN environments, IBM BRS announced it is deploying Double Take mirroring solution from NSI (www.nsisw.com). By mirroring data, it’s possible to recover to the point of impact, Ging points out. The tool employs transaction-based replication, as opposed to replicating from the hard drive.

Other faster solutions include one-to-one replication, many-to-one replication, one-to-many replication and many-to-many replication. These involve some form of a "data-catcher" system, where only data is moved to a backup system, Ging explains.

"The disaster recovery process is the same for client/server as it is for legacy," Gartner’s Scott says. "You back up your data on your applications, move them offsite and make arrangements for alternate processing. The issue is that client/server is more complex, with components spread across multiple systems. You need to do a better job of planning—not a discipline normally associated with client/server."