Preparing for the Unthinkable
Control the damage
The disasters at the World Trade Center and Pentagon in mid-September, while appalling in terms of their origin and horrific in terms of their human cost, could've been far worse, according to most emergency management experts. If the hijacked aircraft had struck the WTC an hour later, as many as 50,000 occupants and an incalculable number of tourists may have been counted among the missing. Similarly, had the Pentagon not been undergoing significant remodeling around the time that the building was attacked, the number of civilian and military personnel packed into those office spaces might have been much greater.
While these facts in no way mitigate the human tragedy that occurred, they did have a significant impact on the overall death toll from the events. As smoke-and-rubble disasters go, the direct impact of the Sept. 11 terrorist attacks was actually more limited than many. Within the past decade, for example, earthquakes in Japan and Turkey claimed a larger number of lives in a much shorter amount of time.
Aside from the social and political consequences of Sept. 11, perhaps the most extraordinary thing about the disaster was that so few of the impacted companies appear to have had any sort of disaster recovery plan. Of the 440-odd businesses occupying the WTC, and the numerous governmental entities in the Pentagon, only a small subset—perhaps as few as 200—evidenced pre-planned continuity strategies. My estimate is based on press accounts of the number of firms that formally declared a disaster and activated their contracts with any of the several leading "hot site" vendors, such as Comdisco, IBM Business Continuity and Recovery Services, Sungard Recovery Services, and HP Business Continuity and Recovery Services. A hot site contract provides for a facility, computer equipment and networks that can be put rapidly into service to replace a subscriber's "production" IT infrastructure if normal operations are interrupted by a disaster.
To be generous, a few companies may not have needed the services of a hot site vendor in the wake of the disaster. In some cases, only "branch office operations" were hosted within the WTC or the Pentagon, rather than a primary headquarters or important data center. In a few other cases, companies may have instead used homegrown recovery strategies and capabilities that didn't require the participation of a commercial service provider.
Even with these exceptions factored in, however, the number of companies that had prepared for the possibility of a disaster were well in the minority. The sad truth is that, as in the case of the 143 companies that simply disappeared in the months and years following the 1993 bombing of the WTC, many of the companies that endured the Sept. 11 tragedy without a continuity plan will not be around this time next year. These companies will learn their lessons about the importance of disaster recovery planning the hard way, adding further pain and anguish to the already sad memory of that awful event.
Based on published reports, along with interviews I conducted in early October with several WTC survivors, some lessons can be gleaned that may help planners to safeguard mission-critical business processes from future disasters. Most have to do with the settings and circumstances in which the disaster recovery capability may need to be activated and used.
1. Plan for Total Disaster
Assume the worst, that critical infrastructure components over which businesses have no control (including telecommunications, power and transportation) are unavailable. This was certainly the case for hundreds of businesses located near the WTC. Most lost power, communications, and, in many cases, physical access to their facilities due to police and emergency management cordons. According to one person with offices near the disaster area, "Our facility wasn't directly impacted by the attack, but we lost all telephone communications with our [financial] clients and all of our overseas lines for almost a week following the event. We had to activate our disaster plan, despite the fact that we had no visible damage." The company, and many of its neighbors, were part of the secondary disaster that almost inevitably follows any regional disaster: Their business operations were stopped cold, despite the fact that they were not directly in the path of the terrorist-controlled aircraft. A well-designed disaster recovery plan takes the worst-case scenario as its premise and is designed for modular implementation in response to any "lesser" disaster events that confront the business.
2. Focus on Key Assets
The most important assets of an organization cannot be replaced. Disaster recovery strategies come in two basic flavors: replacement or redundancy. Since it's impossible to replace skilled personnel or data, a redundancy strategy should be implemented. In terms of data, this means implementing a data mirroring or tape backup strategy and ensuring that it's scrupulously observed and periodically tested.
Many spokespersons for companies in the World Trade Center reported that they had plans in place to account for everything—except for the loss of key personnel that resulted from the attack. According to a spokesperson for the Port Authority of New York and New Jersey, "We didn't anticipate an instantaneous loss of so many senior managers. In the blink of an eye, the fire department lost most of its upper echelon. I remember being asked by a fireman whether I had seen his batallion commander, captain, lieutenant or any other officer. I hadn't and I didn't realize until long afterward that we had lost so many of the hierarchy when the towers collapsed."
As awful as it is to contemplate such human loss, a redundancy strategy may mean cross-training numerous personnel to perform mission-critical tasks, then dispersing them around the corporate campus or into field offices. Companies most profoundly impacted by the Sept. 11 attacks will be those who lost not just electronic- and paper-based information assets, but irreplaceable personnel.
3. Size Doesn't Matter
A lesson that many organizations are learning the hard way from this incident is that they've seriously underestimated the importance of data stored on PC hard disks. When DR planners and IT managers in most organizations think about mission-critical data, they tend to focus on "big iron." That means large arrays, large network-attached storage volumes, storage area networks or mainframe DASD.
However, the lowly PC, with its 30 to 100GB hard disk drive and innocuous Excel spreadsheet used to track key corporate financial hedges, may have significantly greater importance from a business recovery standpoint than all the data in the corporate ERP system. The company that survives a disaster will be one that has ferreted out all of these small-but-critical apps so that they can be replicated off-site.
One planner whose company is recovering from the attack reports that, "We are discovering that the really important data was on hard disks of PCs and laptops that were never backed up and didn't survive the disaster."
4. Work with Law Enforcement
For the first full week following the disaster, police and emergency managers were not allowing personnel into the cordoned area, which extended several blocks from the actual WTC disaster site. When authorities began opening the area to some traffic, only those who were able to show that they resided in the restricted areas were allowed to pass.
Survivor companies recognize that their continuity plans will need to execute under the aegis of law enforcement and public safety professionals who are less interested in how they're going to access offices to power down equipment or take last-minute backups than they are in preventing looting and ensuring public health and safety. It's a good idea to meet with local civil emergency management and police agencies and to obtain "clearances" for your corporate ID badges in advance of any disaster. But, don't rely on the "clearances" counting for much in the event of a major calamity.
5. Expect People's Best —and Worst
The spirit is willing, but the flesh is weak. The stories of courage coming out of the WTC and Pentagon disasters are more than "hero building." It's an almost signature characteristic of disasters that they tend to bring out the best—and the worst—in people. Consider the story of the man who lost his briefcase under a car where he sought shelter from the dust and debris of the second tower collapse. Several days later, the briefcase was returned by a rescue worker who had discovered it while removing the automobile from the rubble. The man was delighted to see that the briefcase still contained his wallet, which was filled with cash and other personal effects. However, a day or two later, the man also discovered that his credit-card number was being used to purchase goods and services all over the city, apparently by the same man who had returned the case, and that this had been going on almost from the day of the disaster. The point is that you cannot assume unilateral heroism in a disaster. The purpose of rehearsing and testing a disaster recovery plan is not to teach recovery team members how to perform procedures by rote, but to get them acclimated to thinking rationally in the face of a great irrationality. Human nature also requires that security safeguards be provided in the recovery environment.
6. Watch Out for Third Parties Affected
Be careful about planning assumptions that involve third parties. Some companies continue to use a "next-box-off-the-line" approach in their strategies for replacing hardware in the wake of a disaster. That is, their plan is to replace damaged components by requesting a priority shipment of new gear from suppliers as soon as possible following the disaster. The shutdown of air transportation following the Sept. 11 calamity compromised quick hardware replacement for many firms.
One survivor reported that his company was running on batteries in the days following the attack. "Power was sketchy and we wanted to mirror our NAS [network-attached storage] across the Hudson River. Our vendor told us that because of a change in their sales territories, we couldn't get more product from the local reseller who had originally sold us the gear. It had to be delivered to us by our new direct sales account representative, located on the West Coast. Our reseller was livid—he was close by with gear on his shelf that he could get to us in an hour, but he was being told he would violate his reseller agreement if he supported us. Ultimately, he told them to go to hell and brought us what we needed."
For reasons like this, it's a good idea to maintain critical spare components at a secure off-site facility.
7. Plan Employee Work Space
Many disaster recovery plans stop at provisioning for systems and network replacement. They don't provide for new user work locations. This not only compromises the recovery timetable, it can lead to employee confusion. Going forward, companies may want to consider using application service providers or managed Web-hosting providers to make mission-critical applications accessible via dial-up connections or the World Wide Web. Such an approach would enable work-at-home strategies for workers who are equipped to do so—a useful hedge until suitable replacement work areas can be located.
8. Try to Avoid the Media
You'll want to establish a command center away from the media. After the bombing of the WTC in 1993, one "Big Five" accounting firm established an ad hoc command center in a nearby office complex. The location proved to be a preferred backdrop for television journalists covering the event. Inadvertently, TV cameras captured images of a whiteboard containing two credit-card account numbers intended for use by the company's recovery teams as they secured supplies and equipment. The numbers were broadcast around the country and were repeatedly misused by nefarious viewers of the program. After Sept. 11, no such mistakes were made. The need to establish a command center away from the disaster site, and to deal with media through the vehicle of corporate communications or experienced PR firms, seemed to be recognized by most disaster-stricken firms.
9. Consider Your Workers
The impact of any disaster, whether the result of terrorism or some natural calamity, exacts a toll on the psyches of workers. Shorter workdays, half shifts, on-site counseling and other compassionate considerations may aid more in a successful recovery than all the logistics and plans combined. The good news is that most companies, including many impacted by the WTC incident, find that concerns about employee availability in the aftermath of a disaster are often unfounded. In most cases, disasters have a galvanizing impact on company teams: More than one planner reported that he needed to turn away personnel offering to assist recovery efforts. It's important to keep employees apprised of the situation, rather than having them draw conclusions from TV reports. When you need employees' help, most planners report, it's available.
10. Reward Innovation
Disaster recovery plans are not scripts for recovery efforts; they're guidelines at best. Given the "shifting battlefield" of recovery efforts, preplanned approaches sometimes need to give way to expediency.
According to one company spokesperson, "We had planned to keep in touch with our recovery teams by cell phone. But there were times after the incident when the cells in Manhattan were completely saturated. We started using runners to carry verbal messages from one site to another. [Sticking to the cell phone plan] would have slowed down our recovery a lot, but some teams simply abandoned the planned procedures and moved forward on their own initiative. As a result, we were able to keep the recovery effort on track."
Reports like this emphasize the importance of encouraging innovation and creativity on the part of recovery teams. When team leaders feel empowered to take the initiative, planners should reward this, either immediately or in debriefing meetings well after the fact. If mistakes are made, forgive them, at least in the short term, since customers, shareholders and others who are waiting for a return to normalcy will do the same. Most will understand that disaster recovery is difficult work.
Tough Jobs All Around
Disaster recovery planning is difficult work, but it pales in comparison to actually executing the plan. This is true whether the disaster situation confronting a company is a minor software glitch that threatens to place the company on the front page of the Wall Street Journal in a less than flattering light, or a terrorist attack that kills thousands of people and reduces statuesque buildings to mountains of dust and rubble in the space of a few hours.
In the final analysis, the events of Sept. 11 have clearly illustrated to complacent companies that disasters do happen. The result has been a surge of interest in disaster recovery planning that may or may not last beyond the current crisis. While the iron is hot, take the opportunity to get senior management signatures on purchase orders for high availability and recovery provisions.
But as the WTC and Pentagon incidents demonstrate, there are no silver-bullet technologies or best-of-breed solutions that guarantee the successful outcome of a disaster recovery strategy. Disaster recovery is less about a product or document than a process: It's a way of thinking about the unthinkable.