ServerGraph to the Rescue

Backups do fail. ServerGraph provides a troubleshooting tool to track down and help you solve problems.

While I do not agree with the analyst assertions flying through the trade press and networks right now exclaiming that about 30 percent of all tape backups fail, it is clear that some consumers are being suckered by them. One storage administrator told me recently that senior management was dogging him to replace his $20 million tape backup operation with some sort of disk-to-disk solution because of things they had heard or read about the vulnerability of tape.

Truth be told, if anything close to 30 percent of your backup jobs were failing on a routine basis, you should be looking for new employment—maybe in the exciting world of fast food. Such failure rates signify one of two things: either you aren’t proficient at the technical aspects of your job (scheduling backups, manipulating software, managing tapes and resources, or maintaining equipment) or you aren’t proficient at the administrative aspect of your job (cracking the whip on your vendors to fix their mess, returning substandard products for replacement, or suing vendors for selling you lousy gear).

I am reminded of a true story told to me by a field support person for a leading tape backup software vendor. His company had been part of a sale of a backup solution to a dark agency of the government (if he told me the name, he’d have to kill me). A few weeks after it was installed, the customer called the software company tech support line to complain that he hadn’t been able to read any data from any backups he had made. It was clearly the software’s fault (it always is), the disgruntled customer argued, and he wanted the problem fixed immediately.

Problem was that no one in the software company had the required security clearances to go into the agency building. Getting them took six weeks, during which the ire of the customer ballooned.

When this fellow arrived to troubleshoot the problem, he was escorted around at gun point (my recommendation for how all storage vendors should be handled, by the way) checking this and that until he discovered the real problem. All of the tapes in the library system were cleaning cartridges. Some overzealous tech had slapped TOP SECRET stickers over the labels on the cleaning tapes that clearly stated they were not to be used for recording data.

The point of the story is that backups do fail and the cause is as often rooted in human error as it is in any purported technology flaws. The difficulty, as Mark Roberts, CEO for ServerGraph in Austin, TX, correctly observes, is that tools have been lacking for some time that would enable you to solve tape problems.

“Reporting tools are weak,” Roberts said in an interview recently, “and analysis tools are non-existent.” This is something that the small crew at ServerGraph set out to address with their products back in 1999. Proof that they have made some progress was their acquisition by Boston-based Rocket Software in late May for $10 million.

Roberts took me on a ride through his product architecture and interface and showed me how it could be used to generate reports that were actually useful and enabled backup managers to drill down into the particulars of backup failures. Often the root cause was evident from the data provided, and going forward, he said, the developers were looking for ways to initiate corrective actions automatically for commonly encountered problems.

ServerGraph is a backup maven’s dream software. It uses no agents and it simply monitors the processing of backup jobs, collecting all the information available and correlating it into meaningful charts and graphs that can be perused at several levels of granularity.

You can view the data daily, find specific servers, disks or files that are problematic, and notify the server admins of the corrective actions needed. As the database grows, you can keep easily measure improvements (or degradation) in the backup system over a designated time period. This is an especially useful feature if you want to quantify the value received from the deployment of new gear.

I suggested that integrating this product with an application-facing SRM product and/or a technology-performance-measurement product such as StorSentry (previously covered in this column) might deliver the one-two punch everyone is seeking for backup problem diagnosis. I introduced Mark to the CEO at Hi-Stor (makers of StorSentry) in Toulouse, France-- Fernando Moreira—and the two became buddies immediately, based on a shared perception of synergies between their products.

Relating ServerGraph’s historical data to StorSentry’s hardware performance data would provide, arguably, the industry’s first comprehensive troubleshooting guide for backup administrators.

Having done my good deed, I returned to the other source of backup ills: the “human factor.” Clearly, user errors will not be detected by any troubleshooting system, regardless of its deep blue math features. Nor will problems relating to the larger problem of unmanaged data: the ultimate user error.

Simply put, the more undifferentiated data that amasses in a company, the larger the volume of data that will need to be pushed through the backup process within an increasingly shrinking operational window. When load exceeds execution timeframe, backups fail.

Enter the disk-to-disk vendors: several hundred vendors in this space have sprung up in the last year or so. I spoke to one of the latest arrivals: FilesX. Next week I’ll tell you what they had to say.

Your comments on backup problems are welcome at jtoigo@toigopartners.com

About the Author

Jon William Toigo is chairman of The Data Management Institute, the CEO of data management consulting and research firm Toigo Partners International, as well as a contributing editor to Enterprise Systems and its Storage Strategies columnist. Mr. Toigo is the author of 14 books, including Disaster Recovery Planning, 3rd Edition, and The Holy Grail of Network Storage Management, both from Prentice Hall.