Value-Add Software: The Risks for Storage Managers
Storage buyers have been conditioned to look for ever-greater "value." We explain why this behavior is so risky.
In last week's column ("Why Data Protection Budgets are in Trouble," http://esj.com/Storage/article.aspx?EditorialsID=3481) I raised the conflict between what vendors offer and what storage professionals need. I explained that placing functionality where it belongs is often in conflict with the vendor community's desire to appear as though they are delivering more value in their product every quarter by adding features and functions to the array controller. Vendors have been effective at setting consumer expectations. Buyers now look for ever-greater "value," and vendors usually respond by providing more embedded software functionality on each generation of their array products.
That's risky. As promised, this week I'll explain why.
To begin, increasing the complexity of the embedded system on the array controller by adding new "value-add" functionality seems to increase the number of obstacles to any sort of common management scheme. That is a bad thing because status monitoring is the first line of defense in data disaster prevention. We need advance notice of hot spots, burgeoning disk failures, and volume usage if we are to avoid calamity.
I am not alone in stressing the importance of management. Ask Mark Urdahl, CEO of Virtual Instruments, whose Netwisdom wares are pointed squarely at analyzing and reporting on traffic in a Fibre Channel fabric. He observes that a core problem of the contemporary FC "SAN" is that you can't see what is going on inside the contraption: "It's a black box." Some hardware vendors provide their own management tool sets with their boxes, Urdahl notes, "but some of these tools actually stop applications while they take measurements." That's enough to interfere with the accuracy of the measurement, let alone the problems it can create from an operations standpoint: think Heisenberg Principle.
Building on Urdahl's observations, it should come as no surprise that surveys find SAN failures occurring more frequently than server failures these days (excluding, perhaps, virtual servers). Bottom line: common management is essential to effective disaster prevention and stovepipe array makers are impairing our ability to manage our fabrics and arrays. Their solution is to sell the consumer two, or even three, copies of their gear and some expensive "value-add" software for replicating data between them -- just to be safe.
A second problem has to do with the definition of "value-add" software. Does thin provisioning add value on the array? Some vendors, including 3PAR and Compellent, have been saying so for years and have many customer testimonials to the value that thin provisioning delivers in terms of time-savings and labor-cost savings for capacity management.
In a thin provisioning array, secret algorithms forecast the capacity demands of applications using the storage kit. This enables a smart process to use capacity already allocated to applications but which is not currently being used to support the demands of other apps sharing the array. This thin provisioning intelligence notifies the consumer when more capacity needs to be added, presumably well in advance of any shortfall. If this works as advertised, nobody needs to manage capacity anymore and storage professionals can be reassigned to other duties (or pink slipped altogether).
So successful have been the small vendors with this technology in selling to smaller enterprise opportunities that the big vendors, including EMC, are going to add thin provisioning to their array controllers in 2009. The question is whether doing this on the array makes sense for the large enterprise consumer.
DataCore Software and other storage virtualization folks say no. Smaller firms may be better suited to an on-array implementation because the one array is all the storage they have. In a more complex and grandiose FC SAN environment, one with many boxes potentially sourced from different vendors, on-array thin provisioning may just make capacity management more difficult. The storage virtualization people make a compelling case for placing thin provisioning-like functionality in the network, as a service, deployed either on servers, on appliances, or in a switch, where it can be leveraged to balance capacity across more spindles in the heterogeneous infrastructure.
This implementation preference supports the sales objectives of those selling storage virtualization, of course, but that does not diminish the architectural thinking of Ziya Aral, CTO of DataCore, or the other smart people pushing this model. Networked architecture trumps embedded architecture when you are dealing with hundreds, thousands, or tens of thousands of spindles.
Thin provisioning on individual arrays is another example of embedded "value-add" technology that may be placing organizations and their data at risk. Let's be realistic: the TP algorithms are secret (like de-duplication algorithms), so we have only vendor assurances that (1) they are not going to play havoc with storage by failing to anticipate a "margin call" issued by an application that wants the space it owns and (2) they are not going to prompt us to buy more capacity than we need based on erred algorithmic forecasts. Much work needs to be done on this nascent technology to determine best practices for its use and best practices for its deployment from an architectural perspective.
The same holds true for de-duplication. To reiterate a description offered last week, de-dupe is essentially a technology for squeezing more data into the same amount of space. Most of the approximately 17 vendors offering de-dupe are suggesting that it is the "natural" replacement for tape: replacing mylar tape cartridges and human management factors with spinning rust that can replicate its contents to an identical kit at a vaulting site or the recovery site, while providing a local copy of data for fast restore of damaged files in the corporate data center. Sounds like a great value proposition, especially when it is included in the metaphor of tape by leveraging virtual tape library technology that makes a disk array look like many tape drives.
There is nothing wrong with the technology itself. Like thin provisioning, the value of de-duplication depends on where the "value-add" functionality is inserted into storage architecture. I recently had a chat with a Sepaton customer, Eric Zuspan, who serves as a senior system administrator, SAN / UNIX, within the IS department of MultiCare Health Systems in Tacoma, WA. Zuspan described in detail his experience with fitting de-duplication into his data protection architecture.
Zuspan was quick to point out that his original interest was in deploying a virtual tape library solution for the organization to help reduce the backup window for Windows applications then protecting their data using LTO 2, 3, and 4 libraries primarily from HP. De-duplication, he said, was an afterthought -- "a nice to have" for enabling more backup data to be maintained in a disk repository created by the VTL. To be clear, he was not out to replace tape, only to implement a buffer of disk that could be used both to stage tape writes and to serve as a platform for quickly restoring damaged or deleted files.
Diligent Technologies (now an IBM company) and Data Domain first sought the opportunity. Diligent offered an appliance or gateway that performed de-duplication of data prior to writing it to disk volumes presented via a virtualized FC fabric. Data Domain, by contrast, offered a complete kit with de-duplication software wedded to a dedicated disk array.
Data Domain was already being used by one group within IS that had purchased it with 50 TB of storage three years prior. Said Zuspan, the organization's experience with the product was not good. "We kept running out of disk capacity and hit significant scaling issues. Basically, the product doesn't scale. You have to keep buying more boxes, and each one must be managed separately."
He said he was also concerned about performance of the Data Domain solution, "It was always slow and, to my knowledge, is still not meeting our backup windows."
Those facts made his technology evaluation team more open to a pitch from Diligent, which said it could deliver an efficient VTL plus de-duplication story for the company's UNIX and 400 TB Fibre Channel SAN environment. After a month's testing, Zuspan was unimpressed.
"We found that the Diligent solution was a two-edged sword," Zuspan recalled, "On the one hand, it let us use our own storage -- an idea we liked, but on the other hand it was very complicated to set up given that we are using IBM SAN Volume Controller (SVC) to virtualize our storage."
The organization kept hunting for a solution for over a year until a VAR introduced them to Sepaton, another VTL/de-dupe provider. He said he hadn't heard of the vendor, but the insistent solution integrator arranged for the product to be delivered for testing. After several weeks, Zuspan said that the Sepaton product, a disk array with VTL and de-dupe functionality, delivered the right price/performance mix.
He noted, "We eventually did a bake-off between our existing Data Domain solution in the Windows area and the Sepaton product. Backups to the Data Domain VTL took four to five hours to complete, while the Sepaton VTL completed the backup in 45 minutes."
This was not to say that the Sepaton deployment was problem free. "Sepaton's engineer arrived to help with the install and we were running backups the same day," he reported. Despite "a few drive failures at first" in the Sepaton equipment, and the fact that Symantec's NetBackup (used for Windows) and HP's Data Protector software (used for UNIX systems) "didn't support the Sepaton VTL at first," Zuspan said that the 50 percent cost savings on the 75 TB VTL platform helped seal the deal. He added that the initial issues were quickly resolved and "the support was great."
An important sidebar offered by Zuspan was that potential regulatory problems with de-duplicating data were sidestepped. "The most recent copy of data written to the Sepaton VTL is not being de-duplicated, so management has peace of mind about any potential information governance or HIPAA issues."
Zuspan was also quick to point out that the deployment of a VTL, de-duplicating or not, does not eliminate the value of tape in his environment. His VTL stores 30 days of backup data on local disk, but weekly backups of approximately 10 TB of data are still being made routinely and shipped offsite. In this way, MultiCare is leveraging the benefits of the new technology while preserving the benefits of the tape investment they have already made.
My impression from the interview with Zuspan is simply this: data protection continues to be a service-oriented architecture. It typically combines server-based components (backup software), potentially some network-based components (asynchronous replication and failover across a LAN or WAN to protect "always on" applications and their data), and possibly some fabric-based services (virtualization, specialty appliances, and a mix of disk, tape, and optical technology as needed). An effective combination of these services must be visible and manageable and must respond to the business continuity objectives, the business information governance requirements, and the budgetary sensitivities of the organization.
Zuspan's story also suggests to me that technology like de-duplication (and perhaps thin provisioning too) is still evolving. That Sepaton outperformed Data Domain or installed more easily than Diligent in this round says nothing about the future performance, preferred deployment model, or long-term utility of the solution. I would wager that a real service-based model will see more of these functions deployed in the network where data can more readily avail themselves of the set of services that are appropriate for them.
Your comments are welcome: firstname.lastname@example.org.