Q&A: Look Beyond Natural Disasters in Your Disaster Recovery Planning

Effective business continuity planning must address several different kinds of worst-case scenarios

There’s been a lot of talk about the ways in which business continuity planning has changed since 9/11, but Brenda Zawatski, vice-president of products at Veritas Software Corp., says that some of the forces that have reshaped business continuity planning were at work long before that tragedy.

In disaster recovery planning, she says, it’s the disruption caused by viruses, worms, and grid-wide power failures that most often trigger outages and expensive downtime. The upshot, she says, is that effective business continuity planning must address several different kinds of worst-case scenarios.

One thing that’s struck me when I talk to people in the business continuity profession, whether they’re analysts or vendors or even potential customers, is that companies still aren’t investing heavily in business continuity planning—even after 9/11. Are you finding that this is changing at all?

I think it’s definitely starting to change. There was a recent survey, done by Gartner, on CIOs, what’s the number one thing on their 2004 agenda, and the number one ranking of issues was security breaches, business disruptions, or outages that result in some period of unplanned downtime.

People always think about 9/11 when we think about disasters, but they’re the big ones that everyone expects outages on, expects businesses to be impacted on. But the ones that have been causing more angst are the worms, the outages, it’s a pain in the business for the IT shop, but they’re not given any emotional forgiveness for it. What I mean is that on 9/11, everyone was running fast and furiously to get their businesses back up, but everyone knew how difficult it was and what was involved, so there was a certain understanding, a certain kind of forgiveness.

But if I’m dealing with a virus at Veritas, and I’m not answering the e-mails of a client, he doesn’t know that I’m impacted by this virus and that I can’t get into my e-mail, so that forgiveness just isn’t there.

That’s a good point. When you think of business continuity, especially in the enterprise space, you most often think of the horror stories, planes crashing into buildings, incidents of some kind. But is concern over these other kinds of disruptions starting to drive business continuity planning efforts?

Yes, it is. I’d say in the last two-and-a-half to three years, we’ve actually seen a tremendous amount of interest because of the virus, as well as the other natural disasters, like power outages. Take the outage that we had last August. Mayor Bloomberg [of New York City] said it was $1.1 billion, that’s what it cost back in August. And when the Bellagio [casino resort] was down for three or four days [in April], they weren’t able to do any business, there were some estimates that they were losing $1 million a day.

So whether it’s a security breach or a virus, or somebody trips over the plug, or something as disastrous as 9/11—they’re all driving this.

Is that also because customers are able to put a number on the potential for loss in the event of a relatively common disruption, like a virus?

Absolutely. What we’re finding also is that [customers] are increasingly asking, “What is the cost of the disruption and how do you measure it?” You can measure it in one of three ways. There’s the obvious loss of revenue and loss of profit while you’re offline. There’s the loss of confidence that your customers have in you. I think the third area that’s also being more and more recognized by businesses as well [is] what’s the loss of the market cap—what happens to their stock price? So now you have not just the CEO worrying about outages, it’s the CIO and the board of directors as well.

And with a virus or worm, the potential for disruption is also great because even if you “remove” it, these companies don’t know if their systems have been compromised, so they have to conduct extensive audits and security analyses and maybe even take them offline?

And it’s not just these mission-critical systems—it’s these thousands of desktop systems, too. Because just as you can’t function without having these mission-critical applications up and running, you also can’t function without having desktop systems for your employees. I think I read in the paper that the Sasser virus infected a million PCs within 20 hours, for example.

Disasters happen at all levels, and more and more critical data is actually sitting on individuals’ laptops than actually sitting on servers. I think it was [Enterprise Storage Group] that recently did a survey that found that 50 percent of mission-critical data is currently sitting on laptops, so if you’re not backing them up, it could be a huge cost for your business. We have a desktop/laptop backup solution that works perfectly on our Backup Exec product as well as our enterprise NetBackup.

You’ve talked about some of the compelling drivers for business continuity planning, but are there still some difficult impediments, do you think?

Certainly. One problem is that we’re not adding value to the product, so there’s the accurate perception that [business continuity] expenses aren’t revenue-generating. What we’re doing is protecting against loss, on their business side as well as their market cap side. Our task is to make this easy so that it doesn’t’ take a hundred people to do this, so one product can go across all of these different types of areas. You’ve got to do it in a way that customers aren’t going to buy new hardware, where they can use the knowledge that they have, the hardware that they have.

When I’ve spoken with business continuity professionals about effective planning, they stress that organizations should try to standardize on fewer applications and on only one or a few platforms, which, they say, makes it easy to manage backup/recovery, replication, and restoration. Is this something that Veritas finds is practical for customers? Are customers doing it? If not, how does Veritas software help customers with a heterogeneous mix of applications and platforms implement effective business continuity plans?

I think it’s not really realistic, this idea of standardizing on fewer applications. It’s realistic if you’re a brand-new business and you’re just starting out, but most businesses have spent hundreds of millions of dollars in their IT infrastructure, and that’s both on the infrastructure of hardware as well as applications, so in most cases, that’s not a feasible answer.

The way we do it is we offer very heterogeneous coverage, everything from HP to AIX to Solaris to Linux to Windows—those are kind of the major platforms we cover. We create the infrastructure for our customers, and they can plug and play any hardware they want, any servers, any storage devices, as well as any operating systems. They can do backup or replication or clustering. They can do vaulting, if that’s what they want. They decide or we help them decide what kind of [recovery point objective] and [recovery time objective] they need, and we figure out what they’re going to need to support that.

Do you find that some customers have consolidated their applications or workloads to help streamline their recovery process, however?

Yes, where it makes sense—like if there’s a flurry of mergers and acquisitions, you’re seeing consolidation of data centers. We see some consolidation of applications occasionally, or just consolidation of storage or servers, and that really comes into play with storage virtualization. If you can actually virtualize what’s below the application in such a way that they don’t care what storage they have, you can actually increase your utilization very fast and then move the workload from one storage device to another.

You mentioned clustering earlier. Usually, clustering has meant tying together two or more systems that weren’t geographically separated—or, at least, were on the same business campus. I’m wondering if in the post-9/11 world, you’re seeing more customers opting for wide-area clusters, maybe of several kilometers or more?

Absolutely. We have metropolitan clustering, and we see a lot of companies doing this kind of clustering. Something that’s not yet integrated is when we bought [utility computing specialist] Ejasent back in February, which enables us to [failover] the application in all of its states, that makes it more powerful. We have customers, like [hosting provider] Bluestar Solutions, that have 50 clusters working right now, because their recovery-time objective is four hours or less, and that’s what their agreement is with their customers.

The other area with clusters is the cluster file system, the [Real Application Cluster (RAC)] solutions that Oracle’s now offering. You want to do a clustered file system when you’ve got multiple instances of your database running, RAC with the cluster, because you have multiple instances running. So we’re finding customers that need this [clustered file system].