In-Depth
Why It's Time to Get Serious About Enterprise Content Management
IBM pushes ECM as a complement to bread-and-butter data warehousing
As a data warehousing (DW) professional, you might not give much thought to content management. Your world is a relational (or multidimensional) world, after all, and spending time thinking about the fuzziness -- the questionable provenance, lineage, and even dimensionality -- of unstructured or semi-structured data may even give you a headache.
Like it or not, however, unstructured or semi-structured data is a problem you're going to have to get increasingly serious about tackling. After all, information organizations want to expose to their DW or BI processes is -- more than ever before -- taking the form of unstructured or semi-structured content.
That's probably because organizations have already pumped most of their structured relational content into existing DW or BI systems, so their relational bases are covered. However, because comparatively few existing integration tools can adequately identify, index, or catalog unstructured or semi-structured content, most enterprises haven't done anything with this data.
"The growing use of BI search and text analytics is part of a larger trend toward leveraging unstructured data in BI and DW, fields that previously have relied almost exclusively on structured data," writes Philip Russom, senior manager of research with TDWI, in a recent TDWI report, BI Search and Text Analytics. "Another way to put it is that unstructured data is playing a larger role in BI and DW over time, and that role is today supported largely by tools and techniques for BI search and text analytics."
Today, most of the data consumed by DW and BI processes is structured: TDWI Research puts that number as high as 77 percent. That's going to change -- and change substantially -- over the next several years, says Russom: as enterprises continue to un-silo and expose once-isolated information assets, they're going to need tools and technologies to reach into and intelligently catalogue this information.
"The view of corporate performance seen from a data warehouse is incomplete unless it represents -- in a structured way -- facts discovered in unstructured and semi-structured data," he points out. It's for this reason that Russom and TDWI expect BI search and, to a lesser degree, text analytics, to mushroom in popularity over the next few years.
"TDWI suspects that adoption of both BI search and text analytics … will increase over five years, until they are as commonplace as Web GUIs and dashboards are today," he argues.
The ECM Juggernaut
One upshot of this, content management vendors say, is that organizations are turning to enterprise content management (ECM) solutions to help them get a handle on their unstructured and semi-structured assets. There's a further wrinkle here, too, argues Josh Payne, classification module offerings manager for IBM Corp., and a mover-and-shaker in Big Blue's ECM product efforts. Whereas organizations might once have tapped data federation or enterprise search tools to help them expose, query, and even manage unstructured and semi-structured, they're increasingly opting for full ECM solutions, driven by a number of factors, including regulatory compliance and the growing importance of unstructured data.
"ECM is maturing, and more and more large enterprises are standardizing on ECM platforms and choosing enterprise standards for their ECM choice, moving beyond these sort of point solutions," Payne explains.
IBM, of course, has stake in ECM. Over the last several years, Big Blue has hugely fleshed out its content management portfolio (as have several competitors), acquiring the former Green Pasture Software, Aptrix, Venetica, and (just last year) venerable document management specialist FileNet.
Just last month, Big Blue introduced a new Classification Module for its FileNet P8 content management platform. Big Blue's new Classification Module helps facilitate the categorization of unstructured content that's either already stored in (or is freshly arriving into) FileNet repositories, Payne says.
Likewise, it helps automate the process of determining just how important content is, as well as of how it should be handled.
What's more, Payne continues, the new Classification Module can classify large amounts of unmanaged content (or reclassify content that's already managed), which makes it ideal for records management. It's a significant deliverable, he notes, because it marks the deeper IBM-ification of the former FileNet assets.
"This is a situation where we've taken a heritage IBM product – the IBM FileNet P8 product – and have really worked hard to innovate it into the IBM ECM architecture to address some of these main use cases and main scenarios where [customers are] facing classification pains," he explains.
While some aspects of Big Blue's ECM strategy might smack of what it's doing on the WebSphere data federation or enterprise search fronts, Payne says there's a world of difference between the two practices: both data federation and search are ancillary to the core unstructured content management capabilities provided by Big Blue's FileNet P8 content management platform, he explains.
Instead, Payne stresses, they complement IBM's core data warehousing strategy, which is distinct from its content management push.
A product such as WebSphere Information Integrator (Big Blue's data federation tool) is really designed to plug into (and take advantage of) an organization's mature, established data warehousing infrastructure. In the same way, IBM's FileNet content management platform brings the same kind of standardization – of policies, processes, and methods – to unstructured content.
"The Information Integrator product is really a solution to the scenario that you've found you've invested in multiple repositories, multiple different technologies – maybe you've merged your bank and you've bought another small bank, and they had a whole other standard that they wanted to unify," Payne indicates.
Just as Big Blue markets BI search and data federation technologies that complement what it's doing on the data warehousing tip, it also markets analytic tools that interface with both FileNet P8 and its new Classification Module.
"We have analytics offerings to span both of those, [such as our] OmniFind Analytics Edition, which is complementary to [the Classification Module], but Analytics Edition covers both content management and data warehousing -- sort of the two key distinctions in the IT architecture," he indicates.
The salient takeaway, Payne concludes, is that organizations are finally getting serious about bringing the same kind of order or accountability to unstructured or semi-structured data as they've already achieved -- via mature, established data warehousing practices -- with their structured relational data.
"That's sort of the trend behind the standardization, where IT organizations are driving standardization decisions for their ECM choice, and by standardizing and having a single catalogue, content that's in that single catalog. In turn it's more organized, it's more reusable, it's more accessible to the end user in the enterprise. That's what classification is going to do."