In-Depth

Q&A: Open Source Backup and Data Protection

Protecting your data doesn't have to break your budget, thanks to open source options.

Data can be your company's most valuable asset, but protecting it doesn't have to break IT's budget. Fortunately, open source alternatives exist.

I spoke with Chander Kant, CEO of Zmanda (open source backup and recovery solutions provider), about the data protection and disaster recovery landscape and how open source can play a role in the enterprise. Chander founded and lead LinuxCertified, Inc. (an open source product and services company), was a business development executive at VERITAS software, and served as a product line manager for storage software at SGI.

Chander also offers several best practices that can help IT ensure that its backup and recovery plans are efficient and effective.

What are the general trends in open source?

First let me say that open source is a development, delivery, and business model for software. It is a compelling alternative to proprietary software in all layers of the IT infrastructure. We all know the success of Linux operating system, Apache web server, and MySQL database. It is common for IT managers to have at least one open source alternative on the short-list whenever they are looking to deploy a new piece of software.

We continue to see bulk of the new infrastructures being built around the open source stack -- whether it be a customer-facing Web application or an internal collaboration application. Open source software continues to grow faster than the overall software market. According to IDC, the open source software market is projected to grow 26 percent annually to reach $5.8 billion by 2011.

What role does open source play in backup and data recovery?

Of course, all benefits of open source software apply to open source backup software as well, including: freedom and flexibility for software deployment; extremely high source code quality (resulting from the transparency during the development process); lower cost of acquisition and maintenance; and extensive support provided by a motivated community of users and developers.

In the case of backup and archiving software, the advantages of using open source software go beyond these well-understood benefits. 

Future profitability of proprietary software vendors depends on locking customers into their formats and components. If a proprietary backup product is used to write to tape, the only way to recover would be to use the corresponding (in most cases, the exact version) product. If you were to restore from this tape seven years from now (seven years is the documentation retention requirement for many companies), you would need to find the exact version of the backup and archiving software, have a valid license or be ready to pay a premium price to recover your own data.

On the other hand, using open source and open standards-based backup solutions makes it simpler to restore your data in the future. When choosing open source backup software, be sure the software uses well-known and standard file formats for storing the archives.

Why is IT considering open source when there are several commercial products (like Tivoli) already on the market?

A network-based backup is a well-understood problem, and the need for backup software cuts across industry segments and geographies. Still, the proprietary backup software vendors sell packages - which are expensive to buy, maintain and are complex to use. This creates a perfect environment for open source-based commoditization of this large segment of IT infrastructure software.

While products based on open source and open standards almost always come with lower initial cost of acquisition, the greater benefit is achieved over the life cycle of the deployment. Inherent freedom provided by such products enables IT managers to significantly lower the cost of ongoing maintenance.

Let's say your organization is using an operating system that is popular today but becomes out of favor in a few years. It is possible -- actually probable - that a proprietary backup vendor will withdraw support for this "unprofitable" operating system. This will force you to make a choice either between using some ad-hoc mechanism to backup that system or to replace the system with a different OS -- both costly choices. Open source communities are known to provide support for older (and sometimes obscure) platforms. Furthermore, the source code is available to compile the software for a particular operating system.

What's the difference between open source and open format?

In my opinion, for a format to be truly open there needs to be an open source reference implementation to read and write to that format. Otherwise, a supposedly "open format" is too prone to have subtle interpretations which can make a dominant software vendor pretty much treat it as a closed format. So, open source is key for a format to be truly open.

Each operating environment has its "native" file formats for archiving which are ideal for backup purposes because system administrators have numerous tools that can read and manage these formats. For example, tar is a great file format for backup archives on UNIX and Linux environments, as is zip on Windows.

What open-source projects are available to enterprise IT?

There are hundreds of very high quality projects available for enterprise IT. A good source is to check the top one percentile of projects on SourceForge. In backup arena the most popular open source project in the world is Amanda (http://amanda.zmanda.com/).

Tell me more about the Amanda project. Who are its users and developers, and what are Amanda's key features?

Amanda enables a system administrator to set up a single server to back up multiple hosts to a tape- or disk-based storage system over the network. It uses native operating system facilities (e.g. tar and zip) for data archival and can back up a large number of workstations or servers running various versions of Linux, UNIX, Mac OS-X, or Microsoft Windows operating systems.

Amanda was initially developed at the University of Maryland and is estimated to protect more than 500,000 systems worldwide.

What role does Zmanda play in open source?

Zmanda was formed out of the Amanda community. Zmanda's goal is to enable the benefits of open source backup (including Amanda) for the enterprise. We fund most of the on-going development in the Amanda project. We have also initiated new open source backup projects, including Zmanda Recovery Manager for MySQL.

Isn't there resistance to open source backup from IT or upper management? If IT's interested in a project like Amanda, how can it overcome management's resistance to open source?

Open source certainly has had to deal with its fair share of fear, uncertainty, and doubt -- largely propagated by proprietary software vendors who fear the loss billions of dollars of future profits. The best proof of Amanda's readiness for the enterprise can be readily seen by scanning through active posts by Amanda users on its forum (http://forums.zmanda.com/). This, by the way, is a unique feature of open source software. No proprietary backup vendor lets you read about everyday experience of their users on a publicly available forum -- all such conversations are hidden behind their customer support databases. Zmanda offers Amanda Enterprise with 24x7 support, with a customer service that I believe is unsurpassed in the industry.

Data backup and recovery used to mean backing up data to tape every night (and again on weekends and at the end of the month), then shipping the tapes off site. Today, there are many more options, such as backing up data directly offsite or to "cloud" sites. What destinations or scenarios does Amanda support?

Amanda backs up to traditional media such as disks, tapes and optical disks. You can use Amanda to create your weekly or monthly disaster recovery tapes and ship them off-site, and many of our customers do exactly that. However, Amanda offers other options as well.

Innovation is a "feature" of open source development. Amanda is the only enterprise backup software that has leveraged backing up to industry standard storage clouds such as Amazon S3. This combines two-step process (first writing to media and then shipping it off-site) into one. You backup directly to a robust off-site storage cloud and even as a small business owner get all the disaster tolerance of Amazon's data centers.

What are some of the best practices IT can follow to develop effective data backup and recovery plans?

Capacity planning is a key task for any backup administrator. They need to determine the answers to two key questions. First, how much network bandwidth the backup software needs in order to finish the backup run before employees show up at work; and second, how much backup media is needed to ensure that backup runs don't run out of media to write on. Amanda does help considerably in this area. Its unique scheduler automatically figures out the right level of backup for each protected system at the time of the backup run -- so administrators are relieved from having to guess what systems need full vs. incremental backup on what days etc.

If backing up to Amazon S3, system administrators need to make sure they have enough network bandwidth to be able to move locally stored data to Internet based storage. One best practice would be to do traffic shaping to prioritize backup data but still have available bandwidth for active users at the time of the backup run.

Security is, of course, also a key consideration when dealing with backups. Backup operators will do well to go through checklist of backup security available at http://www.zmanda.com/backup-security.html.

Must Read Articles