Sanity Check on Tape, VTL, and De-Duplication

VTL and de-duplication, tape versus disk -- a possible new answer looms on the horizon.

Last week, the Linear Tape Open (LTO) organization released a report commissioned from The Clipper Group extolling the cost-of-ownership and energy-efficiency advantages of tape technology over disk technology -- even de-duplicated disk operating as a virtual tape library (VTL). My initial reaction was one of incredulity -- not at the findings of the TCO analysis, which were predictable, but at LTO's decision to fund and promote such research.

Everyone knows that tape is cheaper to own and operate than disk. Equipment is far less costly to buy: from a hardware-cost standpoint alone, storing a gigabyte of data to disk will currently run between $44 with SATA to $89 for Fibre Channel. By contrast, storing the same data on tape will cost roughly 44 cents. (You can add $100 to the per-GB cost if the disk array or tape library is in a Fibre Channel fabric, by the way.)

In addition, tape is greener than disk. The energy consumption per GB of stored data for tape-based storage versus disk-based storage favors tape hands down.

The old saw about labor being a huge cost-of-ownership accelerator in tape has been mostly deflated. A little over a year ago, both Google and Carnegie Mellon issued studies that placed disk failure rates at ten to fifteen times those advertised by manufacturers. Based on these findings, companies that place all data on spinning rust will likely see equal or higher labor costs associated with maintaining disk repositories and replacing failed drives as they would with tape. This fact is rarely, if ever, mentioned by the "tape is dead" crowd.

The findings of The Clipper Group cited by the LTO folks sound impressive. For example, the report claims that the total cost of ownership of SATA disk archiving solutions is about 23 times that of tape-based archive solutions. The researchers go on to say that, although employing technology such as de-duplication to shrink the data stored on the disk archive can lower the overall TCO of such solutions, tape-based archiving still has an estimated 5:1 advantage over de-duplicated disk from a cost-of-ownership standpoint. Finally, tape is the more energy-efficient choice for the data center, providing up to a 290:1 advantage on energy costs.

These numbers were what LTO was probably looking for, needing their own statistics to compete with those created to support the arguments of the disk array manufacturers by their preferred analysts. Search the Web, and you can find reports stating that 1 in 10 (Gartner) or 1 in 30 (Forrester) tapes fail on restore. Since there must be consumers who still take the analyst community seriously, the LTO folks may have simply wanted their own numbers to respond to numbers referenced by their competitors in the disk business.

It might be better for all involved, including consumers, if someone would simply commission a "sanity check" study that identifies the core issues around tape, most of which have little to do with tape itself. Such research would have the merit of identifying how tape and disk can get along to provide real value to the user.

This thought actually crossed my mind a day or two before LTO issued The Clipper Group findings -- as I listened to Miki Sandorfi, chief technology officer, and Asim Zaheer, vice president for marketing and product management at Sepaton, talk about the forthcoming refresh of their company's Virtual Tape Library (VTL) platform.

Sepaton, whose name I am reminded is "No Tapes" spelled backwards, will soon be making detailed announcements about its new and improved technology. More important to this column was the "tone" of their presentation.

Both men were low key in their "VTL as tape replacement" messaging during our briefing. Whether this was because I had articulated a pro-tape view in past meetings or they had learned a thing or two from their customers is unclear. Bottom line: they talked of "integration with the existing ecosystem" rather than hauling out the same tape-bashing rhetoric I hear from so many others in the VTL/de-duplication space.

VTL, I suggested, had morphed through the years from a meaningful technology concept into a vacuous marketing term. The original VTLs were intended to provide a temporary disk buffer for backup datasets headed to tape, providing a staging area for stacking tape images until they were of sufficient size to completely fill an entire tape cartridge in an automated tape library. Today, VTL has become an empty vessel to be filled with whatever meaning a vendor wants it to have. To my surprise, they agreed.

Many vendors see VTLs as a new storage tier, a twilight zone between capture disk storage and that tape repository where 70 percent of the corporate world's data goes to its eternal sleep. De-duplicating the data on this disk buffer is simply a way to pack more of it onto the disk drives, though hardly anyone recommends committing the de-duplicated data to tape. Among other problems, doing so would require an additional step in an already laborious and time-consuming tape-restore process: de-duplicated backup sets would need to be restored, then re-inflated, prior to use. We all agreed that this would be a bad idea.

I noted that most VTL/de-duplication vendors have been telling me lately that their products, once established in a customer shop, can replace tape. Instead of committing backup data to tape, they say, just replicate it to another box (from the same VTL/de-dupe vendor, naturally), which is located at the off-site storage company or at the designated business recovery center that will be used if a disaster occurs. This strategy, akin to electronic data vaulting, eliminates a lot of the manual work associated with off-site storage and tape-based restore in a disaster recovery situation. It also reduces exposure to accidental data disclosure (no more tapes falling off the backs of trucks), expedites restoration of single files or entire datasets, and maybe even eliminates the need for encryption -- assuming that de-duplicated data is sufficiently deflated to obfuscate viewing without the de-dupe engine itself.

Interestingly, as compelling as these benefits sounded, Sepaton's spokespersons preferred to take a different tack. Zaheer observed that many of his 1,000 installations were, indeed, interested in replacing tape with a de-duplicated VTL solution that supported replication over distance. However, the majority, he said, were trying to fit VTL into a disk-to-disk-to-tape strategy that could enable a much richer solution for backup as well as data management. The Sepaton advantage, he suggested, would not be realized in the form of a tape replacement strategy but as an enabler of better data management.

This was echoed by Sandorfi, who referred to de-dupe as "a feature, not a product" and went to some lengths to contrast Sepaton's approach to de-dupe from its competitors. Essentially, Sepaton's technology "cracks open the backup dataset container" to index what is inside, even as their de-duplication process works to reduce the number of bits used to describe the data itself.

To their credit, Sepaton goes to great pains to describe their de-duplication process, and how it leverages the latest dataset to hold the pointers to the de-duplicated data of past datasets. Just about everyone else does the opposite, Sandorfi said, relying on the initial dataset to hold pointers to every subsequent data set that is de-duplicated. Over time, he said, the entire pointer system "becomes fragmented and unwieldy."

Sepaton's slides showed something even more interesting: their real objective. Sandorfi observed that his company's product is a "Trojan Horse" intended to "get our hands on the customer's data in an easy way." There is no malice intended in this statement. Sandorfi believes that by working with VTL and de-dupe he can help companies build intelligent archives of backup data that can be searched readily and managed more granularly than is currently possible using only anonymous backup containers (a quasi-archiving approach used by many companies today).

Obviously, if intelligent archiving is the ultimate goal of Sepaton's technology, then "VTL" is not a completely accurate description of its products. Eventually, they may need to re-cast themselves as a disk-based archive platform vendor -- a segment of the storage industry with its own set of inter-vendor battles and imprecise terminology.

For now, however, VTL and de-duplication are at the top of IT's new technology wish list, surpassing even server virtualization in some current surveys, so Sepaton plays in this space.

The idea of de-duplicating data so that more of it fits on a tier of buffer disk makes sense from a risk-management perspective. Such a process may facilitate fast restore of data that has become corrupted on primary disk and prevent a crisis from developing into a disaster. However, it also opens a hornet's nest of potential problems -- some of which are legal.

Sandorfi said as much when he noted that many of his customers turn off de-duplication for certain data, especially data that is subject to regulatory retention requirements. SEC rules, for example, require that "a complete and unaltered copy" be maintained of certain types of data. Some of Sepaton's customers take this rule literally and do not subject the relevant databases or file types to any sort of de-duplication or compression. Other customers, Sandorfi noted, follow the guidance popularized by many product vendors in this space: if the de-dupe algorithm is validated, then you do not risk running afoul of SEC rules if you apply it to data.

The need to de-dupe selectively means that organizations need to analyze their data prior to using de-duplication or compression technologies to segregate what should and should not be exposed to these services. This, in turn, reaffirms the point made in past columns here that there are no technological silver bullets for regulatory compliance embedded in either de-duplication or VTL products. It also suggests that the labor-cost component of VTL and de-dupe might actually be greater than what we currently see in tape.

In the final analysis, we are left with more questions than answers about the utility of VTL and de-duplication. For one, what problem, beyond fast restore of damaged files, do these products actually solve? Sepaton is meeting this question head on by evolving its product direction into the archive space.

What are the other vendors doing? We've asked them in a questionnaire and many of the leading product vendors have responded. In the part two of this column, I will summarize their responses. Your comments are welcome: