Why the Cloud Needs No Backup

With the cloud, it's backup without the actual backup.

By Andres Rodriguez, CEO, Nasuni

Traditional backup systems have many problems. Backup is a notoriously slow process, both when IT makes the copy and when it retrieves it. Indeed, a single week's backup may take as long as most of a weekend to copy to disk or tape, and this backup window only increases as data volume grows.

Above all else, backups are very expensive. In a world where data doubles every 18 months, traditional backup approaches inevitably create storage sprawl with skyrocketing hardware and infrastructure costs. The cost of recovery only adds to an already expensive problem -- a 2011 study by the Ponemon Institute estimated the cost of downtime at about $11,000 per minute for many businesses, and a single megabyte of lost data can cost as much as $10,000. If recovery takes days, as it often does in the case of a serious IT failure or natural disaster, these costs can become astronomical.

To make matters worse, traditional backup systems are not completely reliable. Industry studies show that approximately 20 percent of nightly backups do not successfully copy all of the intended data, and 40 percent of tape recoveries fail completely. Tape is a fragile medium that can break and return damaged data. Even worse, the tapes themselves can be lost, leaving the organization vulnerable to a security breach.

Disk-to-disk (D2D) backup improves the reliability and speed of backup by using disks as the target media and using de-duplication technology to keep costs reasonable. However, D2D leaves the root cause of the problem, the backup process itself, unresolved.

As a result, backup can actually put businesses at risk by giving them a false sense of security. A better approach would be to integrate data protection into a single, infinitely scalable storage system that gives IT complete control over version restores and is capable of a quick full-system restore from an offsite location. This type of system would dramatically slash the long-term costs of backup and ensure true business continuity.

The cloud is not quite to the point of providing a total solution, where applications and data can all reside offsite in the cloud, completely protected with acceptable performance. Even so, the cloud does currently offer a solution for at least one part of the puzzle (the file server) and points toward a day when it can encompass the entire data center.

The Cloud and Built-In Backup

When Amazon was first establishing itself in the 1990s, its network architects were not primarily focused on creating a service for the outside world. They were working to create a storage infrastructure that could scale to handle the massive amounts of data their business would produce. Given the incredible magnitude of the company's data -- including inventory, order history, and Web archives -- traditional backups were simply not feasible. They needed an infrastructure with tremendous scalability, cost, and speed-of-deployment benefits that could provide 100 percent availability to ensure data and services are always available to consumers.

What they created was a network of tens of thousands of servers distributed throughout the world, continually making exact copies and housing this data in multiple data centers distributed around the globe, which meant that if one server or data center went down, it had no impact on the business due to the architecture's massive and automatic data replication.

Today, Amazon, as well as a host of other organizations, leverages that once-private network to provide public cloud storage. Even though built-in backup is an integral part of cloud storage, there are a number of additional factors enterprises must consider before adopting cloud storage.

Security, for example, is not a primary focus of cloud storage providers, so all data should be encrypted prior to being moved to the cloud. Second, cloud storage providers' business models are geared to organizations with very large storage requirements. For enterprises with more modest storage needs, getting customer service from cloud providers can be challenging.

This can make a relatively simple task, such as recovering an accidentally deleted file, a painful experience. Enterprises whose storage requirements fall below the "massive" category will require a layer of intelligence on top of the raw cloud storage offered by Amazon, Microsoft Azure, and Rackspace (among others) to manage things like system snapshots, storing those snapshots in the cloud, and protecting the data through encryption.

Then there is the issue of bandwidth to consider, because no matter how reliable the cloud storage provider may be, access to data stored in the cloud will be no faster than the enterprise's Internet connection.

Think of it this way: no enterprise would buy commodity disk drives and manually deploy them as their storage infrastructure. Instead, they go to a vendor such as EMC that packages those commodity drives with complementary technology to create a complete, intelligent system. Likewise, no enterprise should rely on raw cloud storage for its storage needs but should seek out partners that can deliver intelligent systems that, in effect, use public cloud storage on the back-end with private-cloud-like features and functionality on the front end.

Typically, these systems will employ a so-called hybrid approach in which the most frequently accessed data is stored in a local cache, providing local-like performance. New data is likewise stored on the cache until forwarded to the cloud in encrypted form, which occurs at regular intervals throughout the day. If an end user calls for data not stored in the cache, that data is pulled into the cache from the cloud.

Zynga offers a good example of how to accomplish this feat. In an August 2, 2011 article at Wikibon, Zynga Cloud Case Study: The Journey to a Real Private Cloud, David Cahill argues

… that this is a commodity cloud inside an enterprise, not an "enterprise" cloud inside an enterprise. There is a material difference between the two. Building private clouds using expensive commercial solutions are a submission to existing, brand conscious, IT strategies. I struggle to understand how companies can extract real cloud economics out of these architectures. Instead, as Cloudscaling's Randy Bias has argued, it is much easier to start with a commodity cloud and layer services on top of this to achieve the desired level of feature/function you need.

With the right intelligence on top of commodity cloud storage, enterprises can finally take advantage of massively redundant cloud architectures, eliminating the need to back up any data stored in the cloud.

There's No Backup Like No Backup

Traditional backup fails to deliver a single, comprehensive solution for data protection; it requires a resource-intensive series of systems with many moving parts: hardware, vendors, storage providers, backup providers, disaster recovery services, and maybe even more, depending on the organization's particular needs. Although the cloud is not yet ready to host and protect the entire data center, it most certainly can perform this function for the file server, providing a model for future data center expansion into the cloud.

Cloud technology can be leveraged to transform and unify how organizations protect data. By taking unlimited snapshots, there is no need to make an entire, massive copy of the data set. Instead, the system can make incremental snapshots as often as once an hour. With the right intelligence on top of the raw cloud storage, data is offsite, secure, and easily accessible.

It makes possible limitless versioning, with an offsite copy of data that is accessible instantly -- all built into a single, complete data protection solution.

In other words, it's backup without the backup.

Andres Rodriguez is CEO of Nasuni, which provides storage services through is storage services network. You can contact the author at AndresRod@nasuni.com