Are You Ready to Organize Your Junk Drawer?
The term ILM still causes people to cringe when hearing that term. Dataglobal is showing some guts by re-introducing the need for information life cycle management into the contemporary storage discussion.
Recently, I spent a couple of hours interviewing representatives of dataglobal, a successful German data management software firm now entering the U.S. market. Taking a close look at their product seemed like the right thing to do given that other media outlets appeared to have written their stories directly from the company's press release. In fact, most press coverage seemed to create, rather than dispel, confusion about the actual problem that Dataglobal addressed and whether the firm was selling storage management, data management, archiving, or something else. If for no other reason than to get some clarification, I decided to give the company a call and find out what they were really up to.
First things first: Dataglobal is a data center software vendor. They have written, or added via their acquisition of Inboxx (an archiving software firm and partner of several years) a number of applications that now link under a centralized Control Center to provide what they describe as a unified storage and information management (USIM) platform.
In operation, the core of the USIM platform is a Java Platform Enterprise Edition (J2EE) server installed to manage agents that run on a broad range of hosts and that, in turn, provide a range of services including archiving, storage resource management, and, in the words of Dataglobal, information life cycle management (ILM).
Yes, you read correctly. Dataglobal is actually planning to market a solution in the U.S. that they term "ILM." Like many readers of this column, I still cringe when I hear that expression, especially recalling the extraordinary blowback that accrued to EMC's failed efforts to market "ILM solutions" a decade ago. That said, I had to learn more about Dataglobal to discover whether something had been lost in German-to-English translation.
Prior to the interview, I re-read my old columns decrying EMC marketecture about ILM. Core findings still held: real ILM is not a product or a platform, but a process requiring four things: (1) the ability to classify data so that it can be exposed to policies that govern its storage and retention, (2) the ability to classify storage targets and to evaluate their status at all times so that you know where to move data, (3) a policy engine for setting the rules for moving data around and the triggers for automating when to move the data, and (4) a data mover that actually performs all the data migration work.
These four distinct components have never been delivered to market -- except in a very rudimentary way on the mainframe with its Systems Managed Storage (SMS) and Hierarchical Storage Management (HSM) utilities. In the distributed world, ILM remained a holy grail.
Not that it hasn't received attention from marketers in the storage industry. In the waning years of the previous century, EMC tried to reinvigorate the idea of ILM by cobbling together its electronic content management (ECM) software, Documentum, with its proprietary storage boxes -- first Symmetrix, then Centera. This resulted in a hard-to-manage-and-maintain technology kluge that never caught on.
The reasons for Hopkinton's stunning failure in this effort were many, but one was the reluctance of firms to abandon office productivity applications (Microsoft Office, for example) in favor of the pre-defined data entry screens of ECM. The proof of this rejection is visible today in the Global 2000 business world. Since the mid 2000s, the volume of disorganized user files have far exceeded the volume of "structured data" (databases, ECM, and e-mail) in terms of the total data produced by the average medium to large company.
Another key reason why ILM did not catch on when EMC last pitched it had much to do with the unwieldy nature of the storage infrastructure itself. Obtaining a clear picture of what storage was deployed and its current state in terms of capacity and performance was -- and is -- a significant challenge.
Despite EMC's claims to have rectified storage resource management issues with the announcement of WideSky, then ECC, then through the adoption of the SNIA's Storage Management Interface-Specification (SMI-S), their customers still report that they must use a nine-page spreadsheet to track capacity usage on each component. The problem is worse when the customer has a heterogeneous storage environment that includes "off-brand" products from NetApp, IBM, HDS, or any number of other disk array vendors (not to mention tape or optical storage).
For ILM to work, you need real-time knowledge of the storage targets where data will be placed. You need all kinds of information, from the queue depths and write wait times on volumes to the depreciated asset value of the storage target itself. That's the only way to ensure that the right data is being placed on the right target, given performance and cost characteristics of the target and the business value and re-reference characteristics of the data. That kind of information didn't exist in the late 1990s, and it still doesn't in the first decade of the New Millennium.
Changes Affecting ILM
One thing that has changed since EMC tried out its unsuccessful ILM offering is the arrival of Microsoft's File Classification Infrastructure (FCI), which Dataglobal supports with its platform. FCI was announced in 2009 as "a new feature within the File Server Role and File Server Resource Manager (FSRM) in Windows Server 2008 R2" touted by Microsoft to provide a capability for file classification, reporting, and policy-based management of Windows file systems. A Microsoft partner, Dataglobal uses integration points exposed by FCI to classify files and apply appropriate policies and actions to them.
Spokespersons for Dataglobal say that they supplement FCI with their own analysis engine, Dg Analyze -- going beyond the basic capabilities of FCI to enable the creation of additional categories of data that are defined by business requirements. They further provide a module called Dg Classification that is used to assign classification criteria to specific files under management.
From their acquisition of Inboxx, Dataglobal is able integrate these functions with what appears to be a set of robust archiving tools aimed at different types of data -- Dg File (user file systems, including Windows, NTFS, and NetApp), Dg Office (for Microsoft Office and SharePoint files), Dg Mail (e-mail archiving), Dg Voice (audio files and other data from Call Center applications), Dg ERP (for data associated with enterprise resource management from Microsoft, JD Edwards, SAP, and others), and Dg Connect -- an SDK used to connect legacy or homegrown applications to the archive that are not supported out of the box. These are all modules of Dg Hyparchiv, which is described as a unified archiving backbone for the organization.
This archive functionality is ambitious and, according to Dataglobal, proven in more than 1500 companies that have Inboxx deployed today. The archive modules enable policies to be set up centrally and applied locally to data of different classes, moving the data from primary storage to archival storage where it is managed in accordance with retention rules. If it lives up to its hype, this unified archive solution would probably be a huge improvement over deploying numerous archival stovepipes dedicated to specific data types that lack any sort of centralized policy management.
What About Storage Management?
That leaves two additional functional modules of the Dataglobal offering: Dg Storage Control, which monitors storage infrastructure including disk and optical, and Dg Chargeback, which facilitates the creation of an accounting and chargeback system for storage resources and their use. To accomplish the storage resource management tasks effectively, Dataglobal is using any and all available management hooks into the storage infrastructure, including equipment APIs and SMI-S providers on hardware and storage volume abstractions offered on servers -- that is, storage as the file or application host sees it.
They do not currently support open standards for infrastructure management, such as those based on Web Services REST and SOAP interfaces, because few "enterprise class" storage boxes are enabled with them. This is a bit short-sighted, in my view, given the current interest in the approach, pioneered by Xiotech, in many storage development houses today.
When thinking about the clear advantages that would accrue to a universal and vendor-neutral storage management methodology, the ready availability of a management protocol standard in Web Services, the fact that applications and operating systems already speak Web Services, the accessibility of "free" code for enabling storage with REST-based management (see Cortexdeveloper.org), and the increased interest in Web Services as a means for solving management and interoperability issues in nascent "cloud" offerings, embracing a Web Services management paradigm would buy more credibility for Dataglobal as a cutting-edge information management player for the enterprise. Without support for Web Services, Dataglobal risks looking a bit like yesterday's news. Following my interview, in which this issue was raised, the company stated that REST support is on its roadmap for next year.
The Constancy of Change
For Dataglobal's offering to truly deliver on its ILM promise, the USIM platform needs to provide a low-effort solution for coping with the problem of constant change: change in the rules governing how organizations must manage their electronic information assets, and change in the infrastructure used to store those assets.
Changing the policies for classifying and managing data over time, in response to business and regulatory policy shift, must be simple and straightforward. My sense is that many aspects of policy setting and policy management in the Dataglobal offering are still very manually intensive. To their credit, Dataglobal doesn't attempt to minimize this point, stating that accommodating change requires human intervention in order to be able to do within an acceptable time frame.
To modify the policies affecting how data is managed, the operator would first need to have an extraordinary amount of knowledge about what data is -- its business and regulatory context. The vendor says that using their product to modify policies themselves is easy, "but when you're talking about reclassification, that's when IT and the other departments have to work together."
Although Dataglobal has gone to pains to enable policy "inheritance" among and between many data classes, the classes themselves seem to lack granularity. The vendor says that FCI provides the means to enhance metadata sets on files so you are not restricted to properties such as date last accessed or last modified as your only means to determine when and where a file should be moved. The spokespersons I chatted with say that the product has a search engine that enables conceptual searching, so that groups of files could be identified and exposed to a management policy. This functionality was not demonstrated during my interview.
Moreover, the operator must know how current policies are implemented to understand how and if they must be edited to satisfy new rules. With the USIM platform, getting to the policies associated with a certain data class requires the navigation of multiple pull-down menus. It might help to have a visual display -- perhaps a user-friendly animation of sorts -- that would show how data is currently being handled. This would satisfy much of the information prerequisite for policy change management for operators who are likely to change jobs over the decades that data is to be retained.
Another important dimension of data management is replica tracking. I would want to know, for example, where copies of a file are located -- on disk and on other media -- so I could implement a coherent data retention/deletion strategy. This would need to include copies of data on primary, secondary, and archival storage as well as copies made for disaster recovery purposes. That way, when a file expires, I can delete all instances of it everywhere. I don't see disaster recovery volumes integrated in the current data tracking paradigm embraced by Dataglobal.
The vendor responds that "it is impossible to do that for offline storage, i.e., CD's not in the jukebox" and adds that "offline data only represents two to three percent of the stored data." I'm not sure what shops they support currently, but any company with a healthy tape backup operation is using it to host far more than three percent of its data.
Storage infrastructure management is also a critical area for ILM and archive. I need to know the status of all of the storage targets to which data might be copied or moved. Passive polling of targets is unacceptable; I want the equivalent of an RSS feed from a blog -- an ongoing trickle charge of status and accounting information from each storage device that lets my policy engine make a smart decision whether to write the data to target a, b, or c based on how busy the target is, how much space is available, what disaster recovery services are availed to the target, how much it costs to store data there, and so on.
The problem is that you can't get this data in any way other than REST-ful storage management. When asked about this point, Dataglobal offered its support for SMI-S as proof that it was paying attention. This reflects the success that the SNIA and its key participants, EMC and a few other brands, have enjoyed in marketing SMI-S in Europe. The fact that the standard is implemented only on a small fraction of storage wares in the market today doesn't seem to bother Dataglobal, because it seems to be on most "enterprise-class" platforms in their client installed base. The U.S. market will likely present a very different case, especially as more companies elect to change out their expensive name-brand gear for lower-cost alternatives that deliver the same value for less money.
Spokespersons for Dataglobal respond that SMI-S is a bigger deal in the shops they serve, most of which are using brand-name storage vendors whose gear is SMI-S enabled: "Remember, Dataglobal goes after the mid-sized and very large companies, not the SMB market." It is worth keeping in mind, however, that even among brand-name vendors that publicly support the SNIA storage management standard, SMI-S is not deployed consistently or evenly on all products, limiting its efficacy as a storage target status collection methodology.
In the final analysis, Dataglobal is showing some guts by re-introducing the need for information life cycle management into the contemporary storage discussion. For all of my criticisms, they are well along a path toward creating a viable platform for delivering the goods in a holistic way.
The only competitor I have seen that comes close is Bridgehead Software, who I caught up with in London a few weeks ago. The difference is that Bridgehead found it too challenging to try to deal with ILM generically -- that is, by presenting a solution for every company. They have found, instead, a successful niche market in health-care. We will look at their offering in a future column.
For now, your comments are welcome. firstname.lastname@example.org