In-Depth

Changing How We Physically Back Up Data

Backing up data should extend further than just taking a "screenshot" of a drive.

Is the current way you're doing backup really the way it should be done? Is it time that we change the script on physical backups?

In a book I wrote a few years back, "Definitive Guide to Windows Application and Server Backup 2.0" (Realtime Publishers, 2010), I postulated a "mission statement" for backup and recovery: Backups should prevent us from losing any data or losing any work, and ensure that we always have access to our data with as little downtime as possible.

But here's the truth: Traditional backup and recovery products don't typically do a very good job of meeting this simple statement.

Traditional backup and recovery has essentially relied on snapshots: Grabbing the data at a certain point in time and dumping it to tape as fast as possible, so that we can grab as much data as possible in as short an amount of time as possible. Sometimes, our backup windows are so small and the data so large that we have to rely on differential and incremental backups, which grab the data faster but require even longer to perform a recovery. In the book, I coined the term "Backup 1.0" for this old-school style of backup, which has been basically unchanged since the 1960s.

We Can Do Better
I began using the term "Backup 2.0" to refer to a new way of thinking about backups. Backup 2.0 is fundamentally the concept of continuous data protection, where our servers and applications are backed up in real time or near-real time, so we never really have any at-risk data. A Backup 2.0 solution provides a way to reconstruct anything up to and including an entire disk volume to a very specific point in time, so that we can "roll back" a server to that point in time, or just access particular files or objects from that point in time without actually restoring the data anywhere.

The way this works technically is typically through a file system "shim" and the same technology used to implement third-party disk quota systems. The shim is just a sort of file system driver that gets notified of every disk change at the block level. The shim can grab each disk block as it changes, and transmit that information -- along with a timestamp -- to a central backup server. The backup server can do fancy stuff like de-duplication and compression, if necessary, so that the backups are smaller (potentially much smaller) than the source data.

Most importantly, the backup server can reconstruct disk volumes to a specific point in time by simply assembling the disk blocks leading up to that point in time. With the right tools, you could mount a backup image and browse it through the OS. If the solution had the right knowledge of database structures for popular products, you could restore anything from an individual message or document up to an entire data store, all to a specific point in time -- and all much more rapidly than streaming that same information from tape (although you'd likely still make copies of the backup data to tape for off-site storage, they wouldn't be your first line of defense).

Habits Are Horrible
I guess the real lesson here is that old habits -- like the backup techniques we've relied on for more than 40 years -- can die hard. But can you honestly say that you're satisfied with your old-school backup techniques? That you yearn to dig through tape indexes and wait for data to stream off disk? That you've never been let down by a corrupted tape, or a missing tape, or data that was lost in between backups? We should be constantly questioning the shortcomings of our technologies and processes, constantly defining our "pie in the sky" wishes for how they should work, and constantly pressuring vendors to deliver newer and better techniques and technologies.

About the Author

Don Jones is a multiple-year recipient of Microsoft’s MVP Award, and is Curriculum Director for IT Pro Content for video training company Pluralsight. Don is also a co-founder and President of PowerShell.org, a community dedicated to Microsoft’s Windows PowerShell technology. Don has more than two decades of experience in the IT industry, and specializes in the Microsoft business technology platform. He’s the author of more than 50 technology books, an accomplished IT journalist, and a sought-after speaker and instructor at conferences worldwide. Reach Don on Twitter at @concentratedDon, or on Facebook at Facebook.com/ConcentratedDon.

Must Read Articles