The State of Data Archiving

Green alone isn’'t going to sell archive. Recognition of fundamental data differences must make the case.

Recently, the key investors in Plasmon, a heavyweight among archival systems solutions vendors with an emphasis on greener, high longevity, optical media, put out the word that the company was up for sale. The decision, predicated upon dismal revenue outlook, was not entirely unexpected.

Plasmon's core market had been targeted by disk array manufacturers over the past few years with the same sort of campaign that has been waged against magnetic tape. EMC's "shatter the platter" marketing attack took aim at a core issue in the storage market -- the only issue that has, in recent years, rewarded disk makers with increased sales: media capacity.

EMC and other array makers argued that disk drives were more capacious than optical media and getting cheaper (at the per-GB level on a single disk drive) every year. That much was true. Even Plasmon's Ultra Density Optical (UDO) media -- which provided a means for storing data reliably for up to 50 years (at least 10 times the reliability of magnetic disk) and offered a dramatically lower total cost of ownership than its hard-disk competitors -- provided a comparatively "paltry" 60GB capacity in its latest iteration. The UDO road map provided capacity improvement of only 240GB per optical platter in the next few years, a function of refinements in media materials, read/write head, and laser technology.

Looking at operational data capture and utilization, disk is clearly the preferred medium. The question usually sidestepped in the debate between disk and optical for archival storage was whether archival data presented different requirements that made capacity less important than, say, resiliency. It boiled down to whether it was more cost effective and operationally efficient to store data with long-term retention requirements on durable optical media or on more-prone-to-failure magnetic disk.

Plasmon did its best to argue that optical was the medium for data preservation over the long haul, but one could argue that the company did not have the means to penetrate the market with its message or its wares. They lacked, for example, the golden Rolodex of a three-letter disk storage vendor's direct sales force that too often translates into "front office sales" -- that is, sales made to senior managers of a company who often lack technical acumen, rather than to the IT people in the back office who might actually understand the value proposition of optical storage.

It also didn't help that the Optical Storage Trade Association (OSTA) for the past few years was too busy fighting over the relative merits of next generation DVD standards to pay much attention to the decline of commercial sales for optical in general. The last OSTA meeting I attended had only a handful of representatives of member companies in attendance and more time was spent searching Robert's Rules of Order for an appropriate framework for discussing any issue than for actually discussing the issue at hand.

In truth, the disk array purveyors know that optical is more durable than disk drives and less green. The failure rates of SATA disk are legend, documented in studies by both Google and Carnegie Mellon. Deploy 10,000 disks and you will lose one disk drive per week on a statistical basis; hardly the earmark of a high-longevity data-retention solution. Moreover, boxes of disk drives have usurped servers as the leading consumers of electrical power among IT hardware deployed in data centers today.

The disk array makers argue that technologies such as block level de-duplication (crushing data so that it occupies less space on a drive) and data compression can help green disk-based storage. Although these claims may be true, the wisdom of de-duplicating archival data remains a subject of considerable debate.

Medical research projects, for example, abstain from using almost any technology that might alter the raw scientific data collected, fearing that such technology will limit the accessibility of that data by new research tools developed in the future. Financial firms and publically traded companies are fearful that de-dupe may alter data in a manner that violates Securities and Exchange Commission rules and routinely hold back certain types of data from their block de-duplicating systems. Given that research data preservation and regulatory compliance requirements are two of the most frequently cited drivers for archive system acquisitions, the suggestion that de-duplication puts magnetic disk on a par with optical from an energy efficiency perspective begs the issue of how the technology conflicts with the data immutability concerns of these potential users.

Another "energy efficiency" enhancement proposed by the advocates of hard-drive-based archives is hierarchical storage management (HSM). HSM does not alter data, but moves data over time, usually on the basis of a simplistic policy keyed to access frequency, from many low-capacity Fibre Channel or Serial Attached SCSI drives onto fewer high-capacity SATA drives. Since SAS and SATA drives consume about the same power, the advocates argue, storing the same data on fewer drives is good for the environment.

The ultimate value of disk-to-disk HSM as an archive modality, however, remains debatable. High-capacity drives have the same or greater failure rates as their lower-capacity cousins. They are, as a rule, less resilient than lower-capacity drives owing to a number of manufacturing differences. That said, if you lose a high-capacity drive packed with a TB or more of data, you lose substantially more data than if you lose a low-capacity drive storing only 70 to 140 GB of data.

Given the propensity of big SATA drives to fail, combined with the extremely slow drive rebuild times when using RAID 5 (or even RAID 6) with TB-plus-sized disks, and adding in the falling costs of big SATA disks, the green story quickly falls prey to the use of mirroring schemes (RAID 1) to ensure data availability. That means that companies are deploying two disk drives in high-capacity drive arrays rather than a set of drives sharing one or two drives for parity striping (as in RAID 5/6). So much for green.

The bottom line is that any argument for replacing optical disc with magnetic disk almost always distills down to a need to enhance magnetic disk in order to improve its resiliency and power consumption characteristics. When you do the math, doubling up on the number of drives and applying half-baked de-duplication or compression technologies increases the price of the disk- based solution above the price of an optical solution -- and by a significant factor.

Rational consumers would see the difference. At least, that is what Plasmon thought. Apparently, in the mental vacuum of the large enterprise, no one can hear an optical vendor scream.

Jim Wheeler, director of marketing and business development for archive solutions vendor QStar Technologies, says that green archive technologies such as optical and tape have long been appreciated by European companies and by small to midsize businesses (SMBs) even in the U.S. "QStar always resonated in parts of the world where power was expensive, but that awareness is just happening in the U.S."

Wheeler further contends that archiving to tape or optical is more affordable to SMBs than are the latest 'de-duplication-enhanced' disk-based systems in the market today. "To the SMBs, who may confront the same need to save everything as many larger companies feel compelled to do, archive is a more affordable solution."

Wheeler believes that large enterprises will come around in time, "These companies have changed all of the light bulbs that they can, and turned up the thermostats on the AC as high as they can stand. Now, IT is getting questioned about its power usage, and it's a learning curve. IT managers are just beginning to understand their impact on the environment. There is not a lot that is green about IT, from toxics released in manufacturing to the lack of recyclability of the finished goods. Still, most IT managers don't care about what goes into the landfill. Green is important when energy costs need to be cut."

"Hard drives are spec'd at five years: from an archive perspective, that is temporary storage," Wheeler notes, adding that some vendors are tweaking their marketing around disk improvements to suggest that more energy-efficient and less-expensive disk drives are keeping pace with the climbing cost of energy. Green alone isn't going to sell archive; recognition of fundamental data differences must make the case.

"Some data simply doesn't need to be [retrieved] fast," Wheeler contends. This important characteristic, he argues, should make the case for placing the data on tape or optical. QStar makes hybrid archive platforms that store data on tape or optical, and metadata (data about the data as well as index information) on a bank of disk drives with some product offerings. Such "hybrid solutions," as Wheeler calls them, get to the best of both magnetic media and optical.

The selection of the appropriate archival medium, he argues, is a function of "how long the data needs to be served up." He cites the recent problems at the U.S. Geological Survey (USGS). Recently, as the government moved to open up surveyed land for oil exploration, USGS went back to its 60 year old tapes to retrieve survey data only to find them "rotting like old films in the vaults of movie studios." The data, Wheeler observes, will cost a lot of money to replace with new surveys.

Optical might have provided a more resilient storage medium for the data. Magnetic media is not as durable without ongoing migration to fresh media. In the final analysis, this is the lesson that many companies, preferring to follow their disk vendor's lead to commit archival information to spinning rust, may learn the hard way.

Your comments are welcome: