Has Information Lifecycle Management Arrived?
Is enterprise content management the next new panacea?
The top storage technology story of 2004, covered in most trade press publications, was how information lifecycle management (ILM) was going to solve all of our problems with respect to regulatory compliance, burgeoning storage costs, and capacity utilization efficiency. EMC seized on the expression, first coined by StorageTek over a decade ago, and spent a good deal of money hyping itself as the one true purveyor of the solution.
Other vendors aligned their marketing messages with EMC’s hype, seeking to draft off the Hopkinton company's advertising budget like so many NASCAR drivers chasing each other in circles. They all recast themselves as “ILM solution providers” despite the fact that ILM isn’t a technology, a product, or even an architecture. Rather, it is a way of thinking about IT service: from the perspective of data and its management requirements.
ILM involves managing data through its useful life and disposing of it when it is no longer needed. Ideally, it is about moving data from one storage platform to another, based on the requirements of the data itself (in terms of access number, type, and frequency); an evaluation of storage platform cost and performance capabilities; continuity planning (after careful consideration of factors such as regulations and laws); requirements; and intellectual property protection.
Even so, by the end of the year, the blush seemed to be off the ILM rose. It was clear to most consumers that no one had this mystical tiger by the tail. Vendors were simply spinning their existing products to make them more palatable, like political campaign strategists do with candidates for public office.
Frankly, I was happy when EMC quieted its fanaticism about ILM and turned its attention to a new panacea, enterprise content management (ECM). Following the company’s acquisition of Documentum, a lot of EMC’s sound bites in the trades took on the tone of a pharmaceutical ad: “Got data management problems? Take a dose of Documentum and call me in the morning.”
This has awakened other ECM vendors to the opportunities of positioning their products as The One and Only Data Management Solution, including very long-tenured companies that have been doing document imaging and management, computer output to laser disc (COLD), and similar things since way back when I was running mainframe shops,. These folks seemed to say, “If EMC can tout content management as the fix for data management, why can’t we?”
So today, we have a boat load of ECM suppliers joining the ranks of vendors hawking everything from SATA disk arrays to hierarchical storage management (HSM), e-mail and database archive, and even backup software—all claiming to be purveyors of The One and Only Data Solution. The problem, of course, is that they are all correct, up to a point.
Looking at data from 50,000 feet, we can see four discreet types of data: files created by knowledge workers, workflow-centric data, semi-structured data (such as e-mail and groupware) and structured data (such as databases).
The preponderance of data, according to UC Berkeley, is comprised of unruly files created by end users—rugged individualists who resist any sort of common data naming scheme. Something like 60 percent of the data produced in companies is in the form of files. The rest is a mix of workflow data (fill-in-the-blanks forms, Web content, etc.), e-mail, and databases.
Databases avail themselves of data discipline via components of the relational database management system (RDBMS), should a DBA decide to use this functionality. If management of stale data in the DB comes as an afterthought, there are third-party products from companies such as Princeton Softek and Outer Bay that can be bolted on to help cull the grain from the chaff.
“Semi-Structured” data also avails itself of management-via-software tools provided by e-mail software providers and by third party software companies (such as Mimosa Systems in the case of Microsoft Exchange Mail). These products will archive older e-mail and separate attachments from e-mail, including both in a secondary indexed repository for ease of access and grooming.
Workflow data is the traditional domain of document management software (now called ECM). In addition to Documentum, some of the big names in this space include FileNet, OpenText, and Interwoven. Talk about hyperbole: if you add up their collective claims about the percentage of all data that they manage using their applications, it exceeds 60 percent of all data! I kind of doubt it.
The Association of Information and Image Management (AIIM.org) has become the self-appointed broker of all information pertaining to ECM and defines the term as “the technologies used to capture, manage, store, preserve, and deliver content and documents related to organizational processes.” Recently, the organization extended its definition of ECM to include “tools and strategies allowing the management of an organization's unstructured information, wherever that information exists.”
What AIIM is saying is what we already see happening in the market: the ECM guys are trying to move outside of their traditional domain and apply their tools to managing files that aren’t part of any defined workflow or managed content environment. I wish them the best of luck, but I suspect that it will be a good deal more difficult to bring the most democratic of data—knowledge worker files—into lockstep with any sort of managed workflow.
Let’s look at the facts. To manage files effectively, you first need to surmount the problem of missing metadata (data about data). That’s a fancy way of saying that nobody names their files in such a way that they can be easily found, indexed, cross-referenced, or otherwise organized.
File systems don’t provide a means for doing this: instead of providing associative or self-referencing bits or hashes, file systems have been designed for maximum flexibility, enabling end users to call a file just about anything. To date, end users have showed no inclination to conform their file naming to any common schema, even when mandated by the powers that be.
Efforts have been made to address the wild world of file systems at the application layer. For example, Microsoft has a form in its Office applications that can optionally be turned on to enable users to add detailed information about the file (spreadsheet, word processing document, line drawing, or presentation deck) they have created. Hardly anyone uses it. Where the function is turned on by a corporate desktop installer, the first thing users typically do is turn it off.
So, how can the ECM folks impose order on this enormous and growing data set where others have failed so miserably? Some suggest that you simply capture all of the output from a department into a category or bucket. Others let you establish a user profile and manage all of the files produced by that user based on his or her profile.
I can almost guarantee that these strategies will deliver no level of granularity in terms of data management. Users, including me, tend to produce a lot of file foo in addition to creating important and business-centric fare. Capturing all the data from my disk drive will give you a boatload of stuff you really don’t want or need to include (for example, in backups).
ECM comes to full fruition when it is applied to an established workflow. Loan originators at a mortgage company fill out forms that are then submitted to loan processors who in turn submit them for audit and approval or rejection by loan reviewers who subsequently submit the work back to the processors who then repair files for resubmission or send approved applications to title companies or banks to complete the loan. You can overlay this established workflow with a content management system and maybe improve the performance of everyone involved.
Now, contrast that with the workflow-less world of files. At present, I’m writing this column while, at the same time, converting video from one format to another, sending e-mails, doing an FTP transfer of art I created to my blog site, and taking notes on stocks I am interested in buying. I am also being pinged constantly by instant messaging, updating a calendar of meetings and phone briefings, checking my travel arrangements for the trips I am making shortly to Dubai, Johannesburg, Lisbon, Paris, Kuala Lumpur, Sydney, Tokyo, Frankfurt, and a number of U.S. cities, and creating a PowerPoint deck. If I simply capture all of this data based on my profile or my machine ID or my department classification, I can guarantee that my files will be no more organized in that repository than they are on my disk drive.
That’s the main reason that when I hear vendors talk about ECM applied to files, my initial response is a big “So what?”
I could be wrong. Last week in New Orleans, EMC held a user event where top executive after top executive from the company took the stage and painted a picture of how the company has now solved ILM: Documentum running over a virtualized storage and server infrastructure comprised entirely of EMC gear.
While I am dubious of the actual value of their solution, I do invite vendors of ECM products to broaden my horizons and inform me of exactly how their product is going to bring order to my universe. A conversation is already underway with FileNet and I will report the details here shortly.
If you'd like to contribute your experience or view, I'd love to hear from you. Please e-mail me at firstname.lastname@example.org.