In the Tape vs. Disk War, Think Tape AND Disk
Here's a simple solution to the disk vs. tape debate: use the technologies together and get the best of both worlds.
By Mark Ferelli
As data protection becomes a growing part of the IT budget, it seems like every technology that records data wants to claim primacy and indispensability. Too frequently in the frantic dash for market share, technology that has stood the test of time is falsely accused of being hopelessly out of date. So it is with magnetic tape in the data center. The veteran technology is inaccurately identified as "yesterday's technology." Tape technology is experiencing a Renaissance; a rebirth based on the growing recognition that tape technology is responding regularly and reliably to today's vital business requirements.
Companies across all industries and of all sizes are looking for reliable solutions to protect the data and assure that it is readily retrievable.
Naturally, the first step is for IT to commit to data management. This includes management of the data itself, the software resources and the hardware resources. Best practices in data classification, metadata management, capacity utilization, and similar disciplines help IT managers grow into tightly integrated storage resource managers.
Analysts and disk vendors have been challenging the use of tape in the data center. A recurring death knell has been rung for tape technology, only to be repeatedly proven false. Contrary to frequent predictions, disk technology has not swept the backup and archiving IT markets. Those who condemn tape as "yesterday's technology" are reciting yesterday's marketing messages from competing technologies. When it comes to a full-featured hardware data protection strategy, the focus should be on disk and tape, not disk or tape.
Storage administrators understand that a large part of corporate data is accessed infrequently. Yet storage volumes are multiplying each year as corporations generate more electronic content and struggle to meet complex business transactions, regulatory compliance mandates, and e-discovery requirements. Expanding storage on the storage area network (SAN) simply by adding expensive Fibre Channel (FC) disks is typically not a cost-efficient option. By implementing a tiered storage architecture, an IT organization can dramatically improve storage capacities, manage storage costs, and improve performance.
The lure of added capacity is compelling; a 1 TB SATA hard drive costs far less than a 146 GB FC drive. When multiplied out over dozens, hundreds, and even thousands of drives, the added storage and lower disk cost can be substantial. Administrators can then further reduce costs by migrating reference data onto tape, which is cost efficient, reliable, and removable. Although tape cannot match the access time of high-end FC disks, businesses may realistically translate tiers of storage as "tiers of service," offering storage users an appropriate level of reliability, accessibility, and performance at each level.
Overall storage performance can benefit from tiered storage. With all data on a single hard disk tier, the effort to access stored data can slow performance to a crawl. That problem is eased once storage is reorganized into tiers across two or more storage subsystems. The reduction in performance speed is offset by the reduction in competition for data access. Furthermore, a secondary tape tier reduces the number of data requests arriving at the first tier, allowing faster access at that first tier.
Tape as a tier-2 solution for backup and archival is now popular for backup, but it is less useful for long-term archives of reference information.
Tier 2 is not Tier 3
What is bewildering to some is the common practice of confusing backups with archives. Revisiting definitions is worthwhile here.
Backups are used to restore an application or dataset (a logical or related grouping of data) to a specific point in time. Backups are incremental operations; copies of data are typically retained a limited amount of time (maybe a few days) until the new increment is created. A variation on the backup is the snapshot, a copy of specific data files that is kept for a very short time, perhaps hours.
Data replication and data backup should not be confused. Replication allows for rapid (in some cases) restoration of data (usually an entire logical drive or a volume) to a relatively recent point in time. The goal of replication is ready availability.
Backup and recovery is optimized to restore data to a particular (and flexible) point in time. The desired data may have been backed up daily for the last month or two, and the data center might need a file from a week past. A backup and recovery strategy provides an easy methodology for skimming different versions of a file and executing a restore.
On the other hand, an archive is a data repository designed for holding rarely changed information. It is migrated information, not just copied files. Archiving includes tools and software, which is integrated into the application, be it files, databases, e-mail systems, or audio and visual records. An archive has a metadata and indexing system so that you can search for exactly the data that you want to retrieve, then you go directly to the media to get it.
It is easy to understand that hard disk arrays can provide increased backup performance and faster restore operations. It also adds RAID-based fault tolerance to backup operations. It is equally easy to understand why tape technology plays a vital role. Tape continues to provide high-capacity, cost-efficient storage that is secure, removable, and safe from viruses. Small wonder, then, that integrating disk and tape in a tiered-storage architecture has wide appeal.
Disk-to-Disk is Inadequate Alone
Although a two-tier system focused on disk has some limited application, it doesn't replace tape completely. Tape continues to play an important role in the data center. We'll examine three critical features of tape.
The first important feature is ease of portability. In his book Disaster Recovery Planning, Jon William Toigo notes: "Many experts believe that effective off-site storage of critical data is the single most important determinant of successful business recovery following a disaster." Too many disk-to-disk backup solutions reside in the same SAN as the primary disk storage. One disaster, be it fire, flood, hurricane, power surge (or the like), would make both the primary data source and the backup data source irretrievable at the same time.
Tape cartridges, on the other hand, are easily dismounted from the tape library and shipped by a variety of carriers to secure off-site facilities. A tape solution is a must, then, for disaster recovery best practices, where portability and cost-efficiency are stated goals.
In the case of databases, corruption can remain undiscovered for some time, and recovery might be required from an earlier version of the DB than the current one. Some disk-based backup products either lack the capacity or are too expensive to hold many days' worth of backup versions. The alternative might be to write the current backup to second-tier disk and then copy all older backups from disk to tape. This same reasoning applies to version-intensive records such as CAD files.
Scalability is a continuing data center concern with the meteoric rise in the amount of data the average data center must store and retrieve. Disk scales by the addition of drives (and controllers, software upgrades or replacements, and floor space); tape scales by the addition of cartridges. Does this mean that tape scales to infinite capacity? No, but scaling is easy and tape cartridge capacities keep growing. Although disk prices per GB drop, innovation in tape technology continues to reduce tape's cost per GB, helping remain the least expensive solution for storing large amounts of data over time. Scalability provided by tape is cost efficient and can contribute to a positive total cost of ownership (TCO). The Linear Tape Open (LTO) tape format, currently the market leader in the computer tape industry, pursues a road map that anticipates doubling capacity every two years.
It's All About the Data
Use of disk drives as a tier-2 solution in the data protection hierarchy offers several benefits. It can reduce reliance on the backup window; store short-term, incremental backups; and serve as a staging layer to prepare data for archiving. This last advantage is important, as it permits transmission from disk to tape without impacting server or network duty cycles.
The target technology for long-term data retention is tape. Combinations of disk and tape are already deployed in which disk serves as a cache for the tape repository. Ultimately, the easy-to-transport tape can be migrated to an off-site venue for disaster preparedness and business continuity; shelf life for tape exceeds 30 years.
Nathan Thompson, CEO at Spectra Logic, compares the cooperation between disk and tape technologies to the surge in hybrid automobiles. He says: "In ten years, 25 percent of all cars will be hybrids, using battery power for the cities and gas for the highways. But even then, there will still be petroleum-based vehicles and battery-only vehicles. Tape and disk will both continue to be used in backup, and longer term data retention will flow from disk to tape."
Rather than facing the endless chain of one-off decisions of tape versus disk and adding complexity, organizations would be wise to consider committing to a tiered disk and tape integration model wherever possible to meet their applications' requirements in the most cost-effective manner. This will enhance IT's ability to contribute directly to the business, serve current needs, and prepare for future needs in preserving data for corporate governance and for legal requirements. Taking a systemic view of your organization's information management requirements and mapping out a long-term plan to achieve an effective tiered data protection strategy should yield significant benefits and make IT a strategic contributor to competitive advantage.
Disk and tape technology both have a place in your data-protection hierarchy. Disk and tape leveraged in partnership for best-of-breed data protection solutions pays off in TCO, operational efficiencies, and, ultimately, the preservation of mission-critical data. In our information-intensive era, it's all about the data.
Mark Ferelli is a freelance technology journalist who has written about mass storage and information management technologies since 1988. You can reach the author at firstname.lastname@example.org.