In-Depth
Happiness, Nothingness, and Long-Term Storage
So-call regulatory compliance solutions are just data movers.
I ran across this message the other day when searching for an article on the Web: “If true happiness can only be achieved through a state of nothingness, you're going down the right path. Actually, we couldn't find the page you requested. Please check the URL.”
For a moment, I chucked over the wit of the anonymous author. Then, I thought about those who find themselves in need of a long-term data retention solution. And it wasn’t funny anymore.
In a previous column, I was critical of those vendors who are leveraging fear, uncertainty, and doubt to market “regulatory compliance” wares. Truth be told, regulatory compliance does not require technology. By and large, it is a people-and-process issue. Folks need to identify what data is subject to regulatory requirements and so mark the data so that it can be included in appropriate data protection strategies.
What gets my goat is the fact that most of the so-called regulatory compliance solutions are just data movers. They don’t tell you what to move or where to move it—they just provide a way to move it. That is about as helpful as new and improved income tax filing systems: they may save you the trouble of licking a stamp and mailing your tax return, but they do nothing to help you surmount the big problem of sorting through your shoeboxes full of receipts and deciding what is and is not deductible.
One regulatory issue that does have technological ramifications is long-term storage. And, other than very low tech approaches such as hardcopy or microform, the technology doesn’t seem to be up to par.
Long-term data storage was once the domain of optical media. Lacking susceptibility to the many magnetic fields that can turn hard-disk-based data to mush or tape-recorded data to spaghetti, optical was touted to be the archivist’s dream technology. Like cockroaches, optical disks could withstand the blast and electromagnetic pulse from a thermonuclear device (provided that it was not directly under the nosecone of the bomb when it detonated).
However, testing by laboratory geeks using accelerated ultraviolet aging of optical media has demonstrated fairly conclusively that even optical media will let us down, and it may happen sooner than later if we don’t pay extremely close attention to environmental factors such as temperature and exposure to radiant energy.
Optical, as it turns out, has a vampiric allergy to daylight. Vendors may promise 20 years of reliable storage, but lab tests suggest that actual life expectancy under normal conditions may be half that for industrial strength optics and half again for consumer grade media such DVD, DVD-R, DVD+R, CD-R, and CD-RW.
While this may be good news to the entertainment industry, which thrives on our willingness to repurchase our movie or album collection every couple of years when the media gives out (remember that these were supposed to be more durable than tape or vinyl), it may be bad news to the regulatory compliance crowd. Rather than a sturdy Klingon-esque “WORF” (write once read forever), optical media may well be just as “WORN” (write once, repeat as needed) as its magnetic storage peers. (Since I have contributed to the lexicon of storage acronyms, please assume that there are trademarks or service marks beside these words until I get around to filing my claim!)
What is needed is a media management methodology that copies data from one disk to another or one tape to another once the storage medium it about to live out its useful life. In other words, once a tape has been read or written a certain number of times, or once a disk has whirred on for X number of years, the data on the media needs to be copied to new media.
This simple observation masks a passel of problems. For one, who will remember what data needed to be moved years from now? For another, who will ensure that the data is still useable before and after it moves from one disk or tape to another?
The second question is the biggest onion. Peeling back its layers, you quickly discover that the file systems used to store data may not exist in their current form in a couple of years. For example, Network Appliance will have to adapt its product to the Andrew File System and replace the Berkeley Fast File System in a couple of years if it wants to use the Spinnaker technology it acquired last year to any purpose. Microsoft is saying that it will once again migrate customers to a new file system whenever it gets around to releasing Longhorn, the next file system for Windows Servers.
Peeling back another layer of the onion, it is quite likely that the application used to create the file will not exist (or at least not in its current form) in five years. Furthermore, downward compatibility with earlier versions is by no means assured. One of my books from 1995 continues to be in print and ships with a diskette that contains forms and checklists that I created using an application that simply no longer exists. Every once in awhile I will get an e-mail from some unfortunate who is having difficulties with the files on the diskette—assuming that he even still has a floppy disk drive. Take that forward a few clicks of the calendar and you begin to wonder whether any of the data you currently store will be readable if the regulators ever need to sample it.
I had the pleasure last year of chatting with an archivist from the Australian government. He told me that they were busily converting electronic data to .PDF files, using the Adobe Acrobat tool set. The decision was predicated on Adobe’s willingness to give them source code to its format and reader tools so that a researcher in 2525 would be able to adapt the code to whatever computing architecture was being used at that time and still be able to read the data. I wondered how Moore’s Law would treat such a scheme over time: would data still be treated as bits organized into sectors and cylinders, or would it be DNA strands in a microscopic vacuum tube. How will you revive Excel spreadsheet data from a disk or tape when the wildly popular way to store data is through the near-field optical addressing of luminescent photoswitchable supramolecular systems dispersed as dopants in inert polymer matrices (i.e., molecular storage)?
Some vendors are saying that the solution is as simple as dumping the data into a document management system and encoding it with a message digest header and tagging it with an IP address that can be monitored using a proprietary controller/reader as it migrates from array to array over time. That could work, I suppose, if you buy only one vendor’s software and controllers for the next several decades.
The best approach for now seems to be a solution based on good old common sense: write the data to tape or disk at least twice, then implement a media management system to tell you when to migrate the data to fresh media over time. If the software used to write the data undergoes a change, you will need to migrate the data into the latest version of the software, then save it out again in its new format—again, at least twice. This should evolve from a cottage industry in 2004 into a major assembly-line business within the decade.
I could be wrong, but chances are that data in this column will have eroded beyond reclamation long before that happens. Your thoughts? [email protected]
About the Author
Jon William Toigo is chairman of The Data Management Institute, the CEO of data management consulting and research firm Toigo Partners International, as well as a contributing editor to Enterprise Systems and its Storage Strategies columnist. Mr. Toigo is the author of 14 books, including Disaster Recovery Planning, 3rd Edition, and The Holy Grail of Network Storage Management, both from Prentice Hall.