In-Depth
Data Archiving vs. Data Backup: A Closer Look
Savvy organizations are embracing data archiving as a means to reduce costs, improve performance, and satisfy compliance requirements
To a storage manager, the verbs “backup” and “archive” mean very different things. To most of the rest of us, they’re frequently used to describe the same action—namely, the process of moving data from an online storage tier to near-line or off-line storage. But backing data up and archiving data are distinct technology practices that have very different requirements.
They also have very different advantages. To the extent that organizations are able to embrace data archiving as a means to reduce costs, improve performance, and—of course—satisfy regulatory compliance requirements, it’s a potentially important distinction.
“In general, our customer base does not implement archiving on as wide a scale as they could,” says Kami Snyder, market manager for information lifecycle management (ILM) solutions with IBM Corp. “There is a difference between backup and archiving and customers don’t necessarily call out the difference in their own mind.”
Archiving itself can mean several things, depending on whom you’re talking to. In general, however, it describes the process of consolidating and moving data from a primary online storage medium—such as a fibre channel disk array—to less-expensive near-line or (in some cases) off-line storage medium. In some cases—e.g., for compliance—archiving emphasizes data longevity and authenticity, especially for e-mails, instant message transcripts, documents, and other kinds of semi-structured or unstructured data.
In all cases, archiving presupposes (comparatively rapid) file-level access to data, coupled (in many cases) with robust search and retrieval capabilities. “It’s a repository, a large index repository of data that’s designed for people to get to it, for people to be able to search it, but a lot of times it’s not going to be accessed very often,” says Steve Whitner, product marketing manager with disk and tape storage manufacturer Advanced Digital Information Corp. (ADIC).
In this respect, Whitner says, archiving is fundamentally different from enterprise backup, which involves taking frequent snapshots of data to protect it against both routine and catastrophic loss. Organizations typically back up operating system- or application-specific data and configuration settings, frequently directly to tape, and sometimes retain backups for only a few days, at which point they’re replaced (or over-written) by newer volumes.
Whereas archiving is typically done onsite, backup can be done both on- and off-site, with deltas sent over a WAN connection to off-site libraries. In most large organizations, aged backup data is frequently managed (“vaulted”) by an off-site provider, such as Iron Mountain Inc. And although it is possible to recover individual files or—in some cases—search for individual pieces of information within backup sets, this isn’t the best or most efficient means of doing so.
But there’s an even more fundamental difference, says Anders Lofgren, a senior vice-president with Computer Associates International Inc. (CA). “In backup, you’re really copying data, and in archive, you’re actually moving the data. You may leave a stub file behind, but you’re actually moving the data,” he explains. But Lofgren, who heads up CA’s BrightStor storage resource management (SRM) family, says there are several different flavors of archiving. “There’s file archiving, there’s e-mail archiving, there’s database and application archiving. The point is that as far as all of these [e-mail, database, or applications] are concerned, the data is still there. It looks like it’s still on [tier-one] disk.”
File System-Level Access
The key, says Whitner, is that archiving software such as ADIC’s AMASS near-line storage management archive provides file system-level access to archive data, such that it can be exposed to third-party storage management tools or—alternately—to collaborative and other kinds of applications. This is important, Whitner says, because archiving has more uses than just compliance.
“I don’t think compliance is really driving to a great extent the classic sense of archiving. What compliance is driving is a class of storage that is available and protected that will let people get to it under specific circumstances, and this is a subset of archiving,” he comments.
“For example, we have people who are looking at big archives of data on rich media, digital assets on rich media. For that, we have our SAN File System, a portable, very high-speed file system that essentially presents data storage on disk as if it were one large file system, so you could look at it as essentially being a SAN file system on NAS.”
Why archive? Or more to the point, why archive any more than you have to—i.e., for the purposes of compliance?
There are several reasons, maintains CA’s Lofgren. First, archive media—which can consist of inexpensive, SATA-powered NAS devices or (more frequently) SATA-based content management systems—is less expensive than tier-one serial-attached SCSI (SAS) or fibre channel connectivity. Second, he notes, archiving can help boost performance. Infrequently accessed files can be moved from primary storage into SATA-based archive storage, for example, or project members, collaborative teams, or workgroups can have archive storage of their own. In this respect, he says, important project information, documents, or other data can be both protected and accessible.
Then there’s compliance. From a storage management point of view, a dedicated e-mail archiving system—such as those marketed by CA (which last week acquired e-mail archiving specialist iLumin), IBM, EMC Corp., and others—makes a lot of sense. These are hardware appliances that provide out-of-box support for IBM’s Lotes Notes/Domino, Microsoft Corp.’s Exchange, Novell Inc.’s GroupWise, and other e-mail systems. “Certainly from an application standpoint, I would say [interest in archiving is] overwhelmingly due to regulatory requirements,” says IBM’s Snyder. “From a file system point of view, we find that 85 percent [of interest] is file-system data, unstructured data. People embrace archiving just for efficiency. So there really are two different drivers.”
so, says ADIC’s Whitner, companies could be making far more effective use of archiving. “Customers tend to use archiving only in a niche in their environments and probably don’t take as much advantage of it as they could,” he concludes. “I think people could expand their use of archiving. People tend to sort of leave all of the data sitting around—or else they get rid of it altogether,” he concludes. “Organizations could set up another tier of storage that is less expensive than primary storage but would allow you to keep a lot of those assets. To some degree, this is already happening in a lot of relatively specialized application areas where people have a lot of data to manage, but it could be more pervasive than it is.”
About the Author
Stephen Swoyer is a Nashville, TN-based freelance journalist who writes about technology.