Q&A: Optimizing Your Data Archives

What defines "old" data, and how do you solve compatibility issues of legacy backups?

Archiving "old" data can free resources for "new" data -- but what defines "old" and how can you turn archiving into a competitive advantage? Then there's the problem of retrieving "old" archives from backup media. Can compatibility issues be solved? For answers, we turned to Jim O'Connor, director of product marketing at Bus-Tech, which specializes in virtual tape and DASD emulation on open-systems storage, Intranet-to-mainframe connectivity solutions, and high-speed data movement products.

Enterprise Strategies: How closely should service-level agreements (SLAs) be considered when a company evaluates its data archiving plan? Do SLAs relax as they get older, and how often should they be reevaluated and renegotiated?

Jim O'Connor: Service level agreements should be reviewed, at a minimum, each time an organization reviews its archiving plans and systems. With laws and regulatory requirements changing quite frequently, a best practice should be to review the SLAs on a quarterly basis, thereby ensuring constant compliance.

Another advantage of conducting frequent reviews of the agreements is that it allows for greater agility in the event that changes need to be made (and they inevitably will). Knowing that change is inevitable, it would be foolish to allow SLAs -- in particular, data archiving SLAs -- to relax over time.

What are some of the stipulations that determine the amount of time that data is archived, and what media are used?

There are currently no stipulations on the type of media that is used to store archival data, but on a case-by-case basis organizations need to determine whether the technology they have currently deployed is able to meet other requirements.

Depending on specific conditions set forth in the SLA agreements and vertical regulatory requirements, the mandated data retention timeframe for records can vary from as many as 100 years to as few as seven years. At the upper end of the spectrum, a significant amount of planning and infrastructure investment is necessary to ensure constant compliance. No single media will remain relevant or practical for that period of time, so it becomes a question of migration and maintenance costs.

Can effective archiving be a competitive advantage?

Absolutely. Although the data is static and a company will rarely, if ever, need to access it, those with the ability to do so quickly and efficiently are light years ahead of those that treat it otherwise. Today's investments in archiving have foolishly become focused entirely on media capacity. Unfortunately, cramming more information into historically unreliable, complicated, and inefficient media will only exacerbate any future problems that arise.

What can be done to overcome generational compatibility issues when retrieving archival data stored on tape?

Unlike newer open systems technologies, tape is an "all-or-nothing" medium when it comes to backward or cross-vendor compatibility. Tape continues to evolve, but rapid changes in the technology have, in many cases, done more harm than good.

For example, one of the leading tape manufacturers introduced seven new "generations" of tape technology in a 10-year span. Not only are the migrations between generations slow, but they're costly, and new tape technologies still fall short in meeting the performance or lifespan requirements of e-discovery and compliance.

There is little that can be done to overcome existing generational compatibility issues. If a company chooses to keep tape as the primary media for archives, they need to accept the fact that they are at the mercy of the tape vendors and will need to perform full migrations for each and every new generation of tape. The alternative, which has become the modus operandi for far too many companies, is to pour time, money and prayers into maintaining old technology just so they can read the earlier generations of tape -- if they ever needed to.

How can organizations prepare for frequent regulatory changes with regard to e-discovery?

Simply put, organizations need to ensure that both their systems and infrastructure are optimized to handle this type of change, because it's an inevitable -- and recurring -- challenge. In general, regulatory mandates to date have centered around two core areas: security (as seen in state data breach laws that have recently gone into effect) and longevity (as is the case within the health-care industry). Successful e-discovery will be a by-product of these two core areas.

To effectively prepare and optimize storage environments, it is best to begin with a critical scrutiny of the processes involved at each stage of the data lifecycle. Far too often, organizations make huge investments in technology to improve the performance of one area, and these siloed investments continue to add up until the system's complexity reaches a critical level where forward progress is impossible. Avoid this by establishing a smart, long-term investment strategy based on processes.

Why is tape still the predominant medium for storing archival data? Do you foresee other media overtaking it any time soon?

Although storage technology continues to evolve, cost and data growth rates are still prohibitive factors in the widespread adoption of these new technologies. When we're talking about data archives, there really are only two realistic media options available today: tape and disk. The advantages of tape are that it is a legacy technology with deep market penetration (meaning less upfront investment), and that it is a relatively stationary medium. Only in the case of e-discovery litigation would archived data need to be accessed, so most organizations optimize their environments for tape, which has little to no impact on data center energy utilization.

Tape cartridge capacity is currently approaching one terabyte. While this increased capacity is good because it means fewer cartridges are needed to store the same amount of data, it comes at the cost of increased difficulty in obtaining specific pieces of data. Say, for instance, that a piece of required data is near the end of the cartridge. The entire cartridge contents would need to be read sequentially until the point where that original piece of data is stored.

The advantages of disk are that the data is more secure, easier to transport, and much quicker to access. In an age where regulatory standards reign supreme, higher-performance disk systems are gaining ground as the medium of choice. Tape was king for many, many years, and although disk is now becoming the standard, it, too, will eventually be replaced.

The upper capacity of disk drives today is three terabytes, which compared to tape means more density for data storage without any impact on how fast specific data can be accessed. Disk capacity is also increasing at a faster rate than tape, nearly doubling every 24 months. The same data center footprint will be able to store and retrieve more data as time goes on.

Data center footprint is a big issue when considering the associated environmental issues such as cooling, fire suppression, or cost per square foot.

The typical debate on price between the two media types usually boils down to power consumption. Tape doesn't consume power when offline. Disk sub-systems will typically use more power (getting halved each year as drive capacity increases) but MAID or spin-down technologies is a great technology to minimize power consumption.

How does storage in the cloud impact your archiving plans?

The location where data is stored -- whether onsite, offsite, or in the cloud -- should not impact archiving plans. Cloud computing adds complexity to the mix, so the actual archiving process will have to be adapted. Remember what the purpose of archiving is: long-term retention and protection of static data. If that goal can be accomplished adequately and efficiently in the cloud, then it is certainly worth consideration.