Take the Storage Scout Pledge

Your challenge is protecting huge amounts of enterprise data stored on diverse, multi-vendor systems. Our expert's advice: Be prepared.

No one likes to dwell on it, but the dependence of business on electronic information—both its integrity and its availability—creates huge vulnerabilities for organizations counting on that information. IT and business managers everywhere need to be concerned about threats to data stored electronically.

I probably don't need to remind you of the importance of taking steps to prevent avoidable data disasters and to minimize the impact of disruptive events that can't be prevented.

Planning for data storage protection and recovery involves a balancing act. Every storage recovery strategy should:

1. Ensure the quick and reliable restoration of data to a usable form in the shortest possible timeframe—without degrading application performance.

2. Safeguard the data, but don't impair its accessibility by applications or users who need it.

3. Deliver the shortest possible "time to data" (restoration of data to a usable form after an interruption), but don't generate more implementation and operating costs than are considered reasonable by your organization.

Rapidly Moving Targets
As if balancing these three factors isn't challenging enough, your efforts in planning for effective storage recovery are complicated by the fact that data itself is a moving target. The volume of data in the average organization is amassing at an extraordinary rate of 80 percent to 100 percent a year. This puts enormous pressure on managers to define storage recovery strategies that are sufficiently robust and flexible to scale with storage growth.

Adding to this issue is the dynamic nature of data. While we like to think of data as a mass of stored bits residing on a static disk drive "repository," the fact is that a lot of important data is in motion constantly. Applications operate 24x7, in accordance with the clock of the Internet and global business, and constantly update databases. Moreover, data is increasingly mirrored—replicated in multiple locations on a synchronous basis—to facilitate efficient access across an increasingly dispersed enterprise or e-business setting.

Storage recovery planning is also complicated by the lack of uniformity, standardization and interoperability among the storage products currently offered. Many current "standards" in the storage field leave substantial wiggle room to allow for different—and often incompatible—implementations of the same standard by different vendors. This fact has necessitated interoperability events—so-called "plug fests"—that have mainly focused on platform stability, allowing storage products to work and play well together. Generally, defining and demonstrating interoperable methods for recovering data stored on these new platforms hasn't been a primary concern of vendors and standards makers. Thus, if your organization is deploying the latest storage platform technologies, you may be surprised to find that the data stored on them is more vulnerable than ever before.

The Bigger Picture

September's tragedies at the World Trade Center and the Pentagon have made disaster recovery planning a more compelling issue for many companies, and this month's column, written months in advance of those horrific events, takes on a special significance. The need to provision storage for "time to data"-based recovery extends beyond the storage platform itself.

In the real world, strategies for storage recovery do not play out in a vacuum. All continuity plans must be able to execute within the complicated milieu of government-executed public safety strategies, vendor and supplier logistical plans, and infrastructure service restoration plans. Remember: Your storage infrastructure does not have to be directly impacted by a major cataclysm, such as a hurricane, fire or flood, to find itself in harm's way. Often, companies find that denial of access to data and systems is an indirect consequence of a disaster located elsewhere. However, an interruption is an interruption, regardless of its root cause.

My recommendations for disaster preparedness are four-fold:

1. Protect people first. As an aside, make sure that personnel are cross-trained in critical tasks—whether IT or business-related. Consider rotating staff between different offices if your organization is geographically dispersed.

2. Prevent avoidable disasters. If you have limited cash to spend on disaster recovery, implement alarming and suppression systems first. Now is the time to add high-availability capabilities—redundant network paths, redundant switches, servers and storage platforms, and infrastructure redundancies for power and telecommunications—to help withstand interruptions and limit their impact.

3. Protect data. Mirror it, back it up and replicate it. Data is not replaceable.

4. Develop strategies for application/server recovery, network recovery and end user recovery—and bounce them off local law enforcement and emergency management agencies for comment. In too many cases of natural or man-made disasters with broad geographic effects, those tasked with recovery will not be granted access to areas where they cannot prove residence. While many of us feel like we live at the office, our drivers' licenses and telephone book entries say otherwise. Clearing disaster plans and personnel with civil emergency management may provide some workarounds.

In the final analysis, storage recovery is part of a larger disaster prevention and recovery capability that must be developed by every business if it wishes to remain in business. Testing is key to ensure that the plan remains synchronized with changing business needs, changing technology infrastructure and with the milieu in which the plan may need to be executed.

—J.W.T.

Consider this: We already know that in a simple RAID array, data restore speeds—how quickly data can be restored from tape to disks in the array—are usually a fraction of data backup speeds. This is the result of a "write penalty" that accrues when writing data to the "virtual volumes" (and their parity drives or stripes) created from hard disks in the array cabinet. The array controller must filter the data being written and perform the necessary parsing and replicating of data before it's written to the volumes. That's unnecessary when restoring, so backups can generally be performed more quickly than restores.

The backup/restore speed gap is narrowing, of course. After nearly 30 years of engineering on array controllers, most high-end array products have largely alleviated the write penalty and deliver improved data restore speed performance.

Separate SAN Issues
Storage area networks (SANs), however, are another issue. The current-generation SAN can be thought of as a huge array. The SAN's "array controller" is a software or hardware-based "virtualization engine" that aggregates physical storage devices (as well as volumes, or partitions, offered by arrays in the SAN) into a manageable number of "virtual volumes."

SANs have been on the market only four years or so, but their use as a data storage platform is growing at more than 65 percent a year, according to industry analysts. A leading driver of SAN adoption is data backup. Companies seek to leverage the high-speed "back-end network" environment of a SAN to enable data backups without impairing the operation of "front-end" production LANs. This thinking, however, runs afoul of the realities of SAN virtualization. While it's possible to perform a bit-for-bit backup and restore of a SAN node (an array or disk) fairly efficiently, backing up and restoring file systems stored on a SAN is considerably more problematic. Performing a restore from tape requires that all data pass through the virtualization engine, which, like a RAID controller, must parse the data to the physical disk drives (and other storage elements) comprising virtual SAN volumes. More than one user of SANs has reported tremendous speed problems with file restoral. Restoring a terabyte of files can take upwards of 100 hours—longer than the amount of time a well-prepared company requires to restore systems and networks in the wake of a "smoke and rubble" disaster!

SANs are not the only problem confronting managers concerned about storage recovery. In fact, the implementation of a SAN is rarely accompanied by the removal of all other storage deployed in the company. For the next decade, analysts say, SANs will coexist with server-attached storage arrays and network-attached storage within the same storage infrastructure. This storage diversity creates both challenges and opportunities.

As managers, you need to consider the impact of storage technology on the recoverability of data itself following a disaster. Make vendors go the extra mile to explain—and demonstrate—how storage recovery can be provided on a specific platform within an acceptable timeframe.

The bottom line is this: In order to compel vendors to deliver recoverable solutions, we all need to make data recoverability a key criterion when selecting storage products. Once vendors comply, the challenge of storage recovery will become less daunting for everybody.

Must Read Articles