In-Depth

Tape is Key to Containing IT Spending

If you want to economize on storage for the massive amount of data you're accumulating, the choice is clear: tape.

In the current economic malaise, the quarterly guidance from most publicly traded storage vendors is a bit less high and to the right than in previous years. The reasons are many, but slowing demand from the big buyers in the financial services industry is one that sticks out like the proverbial sore thumb.

FACT: At a recent storage trade show in New York, organizers put on an additional half-day seminar track, with its own cadre of vendor sponsors, to specifically address storage issues in the financial sector. One of my spies attended and saw only a handful of attendees. The balance of the prospective audience, I would assume, were either out looking for new jobs or trying to keep their current employment by looking busy at the office -- developing plans for projects that probably won't be funded until some remedies are found for the current problems confronting their industry.

FACT: At another NY trade show last week, a representative of Diligent Technologies, an in-line de-duplication appliance vendor recently acquired by IBM, complained that traffic at her booth had not met her expectations. There were lots of folks walking by, she said, but none who had the massive amounts of data to store (a rough gauge in her mind of the type of firm that would seek out a Diligent de-duplicating solution) that would make them worthwhile candidates for follow-up contact by sales reps. "Where are the huge financial shops?" she asked aloud, the crooked smile on her face suggesting that she already knew the answer.

FACT: At a recent meeting with a large value-add distributor (VAD), a representative told me that they were becoming more adept at "harvesting storage vendors" -- a quizzical comment that prompted me to inquire further. He said his company sells many brands of many storage products but has recently begun prioritizing the promotion of vendor products based on lines of credit that the vendors are making available to support sales. Those that offer credit to help distributors and resellers sell their storage wares -- and consumers to buy their wares -- are moved ahead of those that don't. When the vendor's credit line is exhausted, another vendor is pushed to the forefront of the VAD's promotional activities. This is done, he explained, to preserve the line of credit that the VAD uses to run his own company, a smart move given the current economic outlook.

Interestingly, this means that VADs and VARs may not be pushing the solution set that a customer needs, but rather championing those vendors and products meeting the VAD's or VAR's financial objectives at any given time. Some might argue that this is no different from the prioritization scheme that resellers and distributors have always used, preferring to present to clients products that return the greatest margin based on discounts and mark-ups. The current economy just adds a new twist.

The point here is that current economic conditions are having a ripple effect in storage. Everyone from the storage trade show operator to the storage vendor and VAD are beginning to feel the pinch. IT practitioners sit in the middle: striving to build business IT solutions for their firms while dreading the possibility of suspended projects or outsourcing arrangements -- both of which portend potentially career-ending pink slips.

Complicating things are the continuing demands of regulatory and legal mandates, which are collectively turning IT mavens, in the words of one recent pundit, into pack rats. Depending on the law or regulation, data handling requirements differ. Most require a mix of policy-driven data retention (and deletion) schedules, data preservation, data protection (both in terms of security and continuity), data auditability, and "discoverability." The message is clear: companies need to manage data or someone in government is going to come after their executives. IT usually gets the job.

This trend is accelerating the volume of data that organizations are stockpiling. In turn, tapping IT to do the job of data management, a group that heretofore has viewed data as an anonymous set of ones and zeros, is driving a quest for a quick and inexpensive fix that will enable storage staff to stockpile all the undifferentiated bits created by the rest of the company without buying additional gear. It may come as no surprise that every storage array vendor on the planet is offering (or preparing to offer) a de-duplication or thin provisioning "solution" -- in some cases, both -- to help IT cope with the problem (even in the mainframe world, based on chats I have had with IBM).

Both of these technology darlings are being hyped with a loud din as innovations. In fact, they are old news. De-duplication encompasses both file-level duplicate removal and block-level transformation of data so that fewer bits are used to store content; the latter approach is mostly what the vendors are talking about. Thin provisioning is a high-tech shell game designed to free "allocated but unused space" on disk arrays that has been assigned to all of those pesky applications thereby boosting the amount of virtual capacity for storing more bits, whether the bits are de-duplicated or not. Both schemes sound a lot like the much villainized "mortgage derivatives" in the financial industry: neither block de-dupe engines nor thin-provisioning space-forecasting algorithms are backed by standards, and most require that consumers simply take the vendor's word for it that bad things won't happen if the regulators ever take a look.

Complementing the drive to de-dupe and thin provision data, advances in the real capacity of magnetic media continues. Bigger disks make convenient and cost effective and energy efficient platforms for storing the data deluge, say the vendors.

Many IT managers, however, have come to discover that the vendor-recommended strategy -- re-drive arrays with higher capacity drives -- has significant hidden costs. What they haven't been told is that larger drives can compromise the effectiveness of some older RAID schemes, forcing them to need a forklift upgrade to the latest array frame from the vendor in addition to buying the new drives at a hefty mark-up from the vendor, VAR or VAD.

In these times, when both trust and money are hard to come by, the safe bet is to go with what you know. To some IT pros, this means going with a trusted supplier. From an architectural perspective, I believe that this means going with tape.

A few weeks ago I had a long chat with Rob Sims, CEO of Crossroads Systems in Austin, TX. While not a tape automation vendor, Sims has been fighting the fight for tape throughout his career. He says that tape has its problems, that usually distill down to a combination of improper management and deficits in skills training, but the reliability issue fanned by the disk industry directly and indirectly (that is, via paid analyst houses) is not one of them.

Says Sims, "All media is vulnerable, but there is no such thing as an inherent failure propensity in tape like there is in disk. Why do you think we need RAID and other redundancy technologies on disk arrays? We understand their vulnerabilities and the industry adapts to deal with them. The tape manufacturers, on the other hand, know that tape has its problems, but there has been little effort made by manufacturers to explain the fundamental issues."

Sims says that tape has taken a bad rap mainly because of "business laziness" on the part of tape automation vendors. "They have a razor/razorblade mentality: the real objective is to sell media and maintenance." He notes that cost comparisons of the technologies are pointless or stilted, whether they are paid for by EMC or by the LTO Consortium.

"What troubles tape is that vendors don't seem to be trying to improve the customer experience," Sims argues. "They have actually confused things by allowing the tape drive manufacturers to use a mix of native data rates and compressed data capacities to come up with some data points that compete with disk from a capacity and speed standpoint. Nobody talks about streaming rates, which is where tape beats disk hands down."

He says that the speed issue (how fast a backup is completed) is confusing. The drive manufacturers press consumers to deploy increasingly faster drives, not recognizing that they do not operate the drives they have at their rated streaming rates for many reasons. Says Sims, "It's like selling someone a faster car to drive on a gridlocked highway. What's the point?"

Sims also notes that "the simple fact is that tape streaming rates beat all other methods for recording data. When you drop below the streaming rate, shoe shining results, which in LTO tape, brings in the potential for media and mechanical damage."

Crossroads Systems offers products, such as its Read Verify Appliance (RVA), that can monitor tape media and automation usage and wear. RVA provides information that can help administrators optimize their tape operations and investments in media and hardware. To hear Sims explain it, the greatest enemy of effective tape operation is usually backup software, or a lack of skills in operating backup software. Fix that, with good software selection and proper training, and tape has a lot of runway ahead as an effective choice for storing backups and infrequently accessed data.

"That's most data over 30 days old," Sims says. He argues that disk-to-disk strategies do not replace tape; they complement it.

"Realistically, tape, disk, and solid state all have value. Those who don't seem to see that are selling just one type of media or system," Sims asserts. He notes that backup to tape is far more efficient than backup to other media. Substituting disk for tape backup makes little sense.

As he explains it, "A backup is supposed to be a point-in-time copy of data written to media. With tape, you only need to know the utility that wrote the data, the format of the utility container, so you can restore individual data sets in that container, and the media itself. That means that you need an indexing system for the media and a copy of the software used to make the backup. By comparison, with disk-to-disk approaches, you need a whole system to restore data. I can see, based on its ability to restore a single file, how disk-to-disk might work well enough for data retained for no more than 30 days. Further out, however, disk-based backup becomes problematic."

He argues that data de-duplication makes disk-to-disk approaches to backup/restore even more vulnerable, "If you are de-duplicating your backups so you can fit more of them on disk-to-disk backup platforms, restoring a single file becomes more problematic. You need a bigger description of the data, including the index of the de-dupe bits themselves (what was pulled out of the data to store it more compactly), to identify the right container containing the right file. Given that disk itself is vulnerable -- blocks go bad, cylinders go bad, platters go bad -- it doesn't take much for a one TB disk to fail. When it does, you had better hope that the hash annotation associated with de-duplicated bytes doesn't get corrupted."

Of course, a catastrophic disk failure scenario is what RAID was invented to prevent, Sims acknowledges. In addition to being a cost accelerator of disk-based solutions, he observes that there are more than 10,000 patents on RAID that do not ensure the compatibility of one system with another. If you lose a disk, data restore times are impacted by the amount of time required to rebuild the failed drive, then the RAID set, then the indexes and hashing schemes for the data itself. Simply put, Sims notes, "De-duplication is not data generic, tape is."

For storing massive quantities of data for long periods of time, Sims says there is no substitute for tape. California's high-performance computing center, he notes, writes its data directly to tape because it offers extremely fast write speeds, keeps data forever (with proper management), is infrequently accessed, and is affordable at about 44 cents per GB or less.

If you want to economize on storage for the massive amount of data that companies are accumulating today, Sims says there is no better choice than properly managed tape technology. We fully agree.

Your views are welcome: [email protected].

Must Read Articles