Disaster Recovery: An Overview

With the tremendous growth of the PC industry, implementation of LANs and WANs, great strides in the communication genre and a faster-paced society, little consideration was given to the possibility of losing an entire physical facility due to such things such as fire, terrorism, or Mother Nature's hurricanes and floods. What would you do?

A short 25 years ago corporate disaster recovery was virtually non-existent. The Information Technology (IT) industry was predominantly legacy systems (big mainframes) whereby all software and data, including databases, were backed up on some predetermined cycle such as daily, weekly, monthly or incrementally to round reel magnetic media or tape. These backup tapes were stored either on site in a "fireproof" safe or off site at another location owned by the particular corporation. Departments within the corporation other than IT such as Accounts Payable, Human Resources, etc. utilized IT services to enhance productivity, make information available on a wider scale and provide better service to their internal and external clientele.

That was about it in a nutshell. The plan for recovery was simple: "Our building is solid and secure. Our facilities infrastructure in terms of utility power and communications services are very reliable. If our computer experiences an outage we will simply contact our vendor to repair it. If our disks ‘crash,’ we will rebuild our legacy system from either our safe or off site storage backup tapes. Corporate departments dependent on our services will process manually as they did for many years until IT is once again a viable entity."

This was a great plan 25 years ago. So, what happened? A whole heck of a lot actually. The tremendous growth of the personal computer (PC) industry and subsequent implementation of LANs and WANs, great strides in the communication genre, including TCP IP protocol and the infamous router; a fasterpaced society demanding a greater requirement for instantaneous information in order to make competitive business decisions, little consideration was given to the possibility of losing an entire physical facility due to such things as fire, terrorism or Mother Nature’s hurricanes and floods, and the fact that in the early 1980s the federal government mandated that any bank that wanted to be federally insured (FDIC) must have a disaster recovery plan including a recovery hot site and demonstrate that the plan works to federal auditors, are some of the factors that changed the way we process. Collectively, these factors and others created a gigantic dependency on IT that would not allow a corporation to revert back to manual methods. In addition, if a corporation lost its entire facility they needed to find an alternate site to process their corporate data. True, some small shops may have been able to "cheat" for a while, but, for the most, part if IT or your facility were unavailable, your corporation was in severe jeopardy from your competition.

Hence, the birth of disaster recovery (DR) in the late 1970s. Initially, disaster recovery was very rudimentary. Vendors would provide a facility with compatible equipment and very basic networks in the form of dial connectivity via modems. Clients would back up their data and software on some designated cycle, and store that information off site. At the time of disaster declaration and/or testing of their disaster plan, the client would take their backup tapes from their off site storage to the vendor’s facility and attempt to rebuild a viable system that emulated their home site computer. It soon became apparent that DR was an entity that needed a lot of attention. Why? DR required a tremendous amount of coordination to make sure all the pieces were in place for a successful recovery. This requirement spawned a new position in many corporations called a DR Coordinator (DRC). The DRC was responsible for all aspects of recovery including hardware configurations, networking, software applications, system software, defining critical data and/or applications and interfacing with the recovery vendor to ensure viability. With regard to the critical data and applications, corporations soon realized that not all data or applications needed to be processed in disaster mode. Therefore, it was required to define the critical data and applications and ensure that they were backed up to some off site storage.

Off-site storage, in conjunction with disaster recovery, enhanced the off site vendor industry tremendously. Not every corporation could afford and/or desired to provide their own off site storage. Instead they contracted with vendors that were dedicated to off site storage and provided a suite of services, which included keeping inventory of tapes, pickup of backup tapes for off site and delivery of cycled tapes returning from off site. In addition, the off site vendor would deliver tapes to their client’s disaster recovery site for testing purposes or during actual disasters.

Clients soon learned that they had a new set of rules to follow to ensure a successful recovery. The first and foremost is that "you do not prepare for disasters, they happen." This means that at some point in time during the testing phase of recovery one must test their off site storage for correctness and not create special backups for testing. It is imperative that the client facility have a disaster recovery plan. Why? Because, when a disaster happens there is no guarantee which personnel will be available. Your recovery must be independent of particular personnel. To drive the point home, the client should rotate staff on subsequent recovery exercises in hopes of having someone on staff familiar with DR at time of disaster.

Current disaster testing or recovery processes come in many flavors. The most typical is backup to tape, send backups to an offsite, retrieve the backups during testing or disasters, go the vendor’s site and recover. Remote testing or recovery is becoming a very strong option with the level of communications today. In this scenario, the client does not show up at the vendor’s hot site. They send their backups to the vendor; the vendor configures the system, loads system software, and brings up communications to a point where the client can access the network via dial in or an Ethernet, LAN, etc. At that point, control is passed to the client to run their exercise remotely. The remote facility could be the client’s home site or a conveniently located site provided by the vendor. A third possibility for testing recovery is vaulting. Vaulting takes on various scenarios. One methodology is to vault to tape. The tape drive(s) can be located at the vendor’s facility or at a remote location of the client’s. The advantage to vaulting to the vendor’s site is that in a test or disaster environment the data is already on site. No transportation is required. A second methodology is to vault to disk. This is a very expensive option, because the client must provide disk drives at an off site, either their own or the vendor’s. Each time a write is done to disk on their home site computer, a write is issued to the alternate drives. If the drives are located in the vendor’s facility then testing or recovering disasters requires shifting the I/O cables from the client’s alternate disk to the vendor’s mainframe. No loads are required. After booting the system, the client data, including databases are available. If the alternate drives reside at the client’s remote site then the data would have to be electronically transferred to the vendor’s disk drives.

The vendor’s challenge in this whole process is first to provide a facility and necessary hardware for the client to recover during tests and disasters. Secondly, the vendor must aid the client in setting up their recovery network such that a user out there in the field will realize no difference except that his/her workstation and/or terminal name has changed.

There is one more piece to this recovery puzzle that has been born out of necessity. That is contingency planning or disaster recovery planning. To really have a viable recovery, a client must have a plan in place that is constantly modified and updated as processing requirements change. This is because personnel come and go and processing requirements change all the time. As you may have guessed, this has spawned a whole new industry. There are vendors out there that do the whole package, however, there are also vendors out there that only do one or the other.

From a corporate perspective, it is important for an organization serious about disaster recovery to have business interruption insurance. Costs can and will be very high when in disaster mode, depending on the type of disaster, and the corporation should be prepared to cover those costs through insurance. Obviously, during the testing cycle of the recovery plan a corporation should attempt to keep costs to a minimum, as testing is an iterative recurring cost. For some clients, it is impossible to test every phase of a disaster plan in the time allotted. Therefore, the corporation must plan to test different parts of their recovery on successive tests. For example, one time they may choose to test their network, a second time one may choose to test the accuracy of their off site storage, and possibly a third time one may choose to test their actual recovery plan document.

To summarize, disaster recovery is here to stay as long as IT as we know it exists. As a matter of fact, the need for recovery will grow even more critical as we grow more dependent upon our IT environment. It should be pointed out that disaster recovery is not only an IT issue. IT only provides the vehicle to recover. All other departments within an organization must participate if the corporation as an entity is to survive. Each fall the UNITE organization (representing Unisys Users) holds an annual conference. The conference is attended very heavily by Unisys as well as a host of other vendors that represent various venues in the industry, including disaster recovery. A wealth of information is provided via breakout sessions, general sessions, and a sizable vendor display room. Whether you are interested in disaster recovery, NT, Unisys hardware, etc. there is definitely something for everyone. You can find out more about our organization at www.unite.org.