Unified Theories of Data Management

How do we manage the combined output of all of the event logs produced by all the available, and frequently used, management tools?

In the world of physics, there is an ongoing quest for a unified theory: a single set of principles, backed by agreed-upon theorems and practical demonstrations, that will explain everything once and for all. That may well be a holy grail, but it seems that the practice of data management is fast becoming a convergence point in contemporary computing initiatives, whether for information security, data protection in the disaster recovery context, or for good old-fashioned capacity management. A conversation last week with Scott Gordon, vice president of WorldWide Marketing for SenSage, “a five year old startup” in the realm of data forensics, underscored the point.

Gordon says that SenSage is well along the path of correlating the event logs generated by application, system, and network management systems --an endeavor begun during the dot-com era, to provide a uniform analytical repository that enables intelligent and strategic planning. Let’s break that down.

In the realm of systems, network, and applications management—not to mention storage management—specialized monitoring tools are used to count and measure things deemed pertinent to performance, availability, and resource consumption. The idea is that these tools enable administrators to (1) monitor a lot of gear and processes with fewer hands, and (2) detect burgeoning problems before they cause noticeable performance degradation or downtime.

Many management tools have come and gone over the years, and a few vendors have been so bold to suggest that they possessed a “framework” product that would enable multiple management tools to be administered from a single console. Usually, frameworks were only as good as the support they received from independent software vendors who wrote the separate management tools. In more than one case, frameworks fell apart when vendors began competing rather than cooperating with each other. Today, after nearly 25 years of trying to create a unified framework, management tools continue to proliferate, each with its own reporting format.

There are some subdomains of management that have achieved something close to log correlation. In the realm of backup management, for example, both Tek-Tools and Bocada talk a good game. They can correlate the logs generated by a wide range of backup software products and bring the overall backup process into finer resolution, aiding predictive analysis and troubleshooting. The limitation is that they only see things from the backup server’s point of view, on occasion aided by a handout of information from a proprietary disk array or tape device manufacturer.

The same holds true of technologies like SMARTS, which is now branded to EMC. To the extent that vendors are willing to support the underlying software protocols, SMARTS provides a mechanism for aggregating data about your fabric-attached storage and switching devices, as well as some applications and IP network components, delivering a fairly coherent view of the IT infrastructure. EMC says you can use SMARTS to instrument your infrastructure, monitor components, and model workflows, and use it to predict problems.

Gordon is quick to point out that the problem with many of these subdomain managers, including SMARTS, is that they require buy-in from vendors of the technologies being monitored. Moreover, they do not retain data for an extended period of time, precluding analyses over the long haul.

“Longer-term data is absolutely required for what I call ‘incident scope analysis.’ That’s the kind of trend analysis that can really expose the root cause of problems that you are having and that can provide information that will let you define a better IT strategy.”

Gordon uses security management as a metaphor. He says, and I wholly agree, there are many products on the market that can spot, in near-real-time, a script kiddie trying to break through a firewall or a hacker trying to muscle his way through a system or application password. “Insider threats” are much more difficult to fight. Gordon notes, “Insiders have legitimate credentials and their attacks are typically not observed by security management tools, and are not quickly discovered— usually between nine and 24 months.” Without longer-term logging, it is virtually impossible to develop strategies for reducing the exposure of data assets to insiders.

That metaphor plays out in other management domains, where tactical, rather than strategic, management tools are currently being deployed. Deploying some kind of management—any kind of management—certainly makes sense in this do-more-with-less world. However, the unanswered (and often unasked) question remains: without the long view, are we really getting sufficient data to effectively manage and resolve the root cause of our problems, or are we simply buying better tools for putting out daily fires?

Like a unified theory of physics, we need a unified knowledgebase for management, but building one poses difficult problems.

Tacking the Problems

Let's assume that we have separate management tools for storage, security, applications, host operating systems, and networks—possibly more than one management tool for each. How do we manage the combined output of all of the event logs produced by each of these management tools? Every key press, every network signal, every bit that is accessed gets logged. We are talking about a huge amount of data over time.

Gordon notes that the problem is even more complex the closer you get to it. You need to know how logs and reports can be collected from the management tools: some use streaming techniques while others do batch processing at discrete intervals. Going further, you need to understand the formats of each log or log message being generated by every app. Nuances are very important here, since you don’t want to lose the context of the log.

The next set of issues has to do with the structure of the repository itself. Originally, managers of managers tried to collect all data into a massive relational database. To a one, however, the framework people quickly ran into issues of scaling and performance. As Gordon notes, “You need to find a way to place the event data into a repository without a lot of ongoing fine tuning based on load.”

Finally, the repository needs to feature a rich set of tools that will let you ask questions and identify correlations that you didn’t intuit yourself. That is where the black art of data mining comes into play.

Gordon says that nobody, including SenSage, has realized the holy grail of the perfect unified event management knowledgebase, but with five years of development under its belt, SenSage has learned a few tricks. For one, they support over 200 mainstream log formats and they don’t lose syntax as they store the data in their columnar repository. Secondly, they use off-the-shelf compression technology to squeeze the data they amass in their clustered server environment. Once stored, data is never deleted or updated. To speed ingestion, you simply write to more clustered SenSage servers.

The original intent of the SenSage technology was to aid in security-management log correlation. They have since expanded into the wonderful and wacky world of compliance-related data forensics, providing all of the necessary audit-trail creation and analysis features required by current regulations in the U.S. and abroad.

The potential for their technology is much grander, if you listen to Gordon. He can see a time when the techniques they are bringing to bear in compliance forensics might also be used to manage company information in a much broader sense. As things presently stand, SenSage’s internal log-data management functionality might well provide a model for continuous data protection, archiving, and even policy-based migration. They have already worked together with EMC to leverage the latter’s content-addressable storage platform, and Gordon insists that he is open to discussions with newcomers in that market who can help him define and address a larger data management opportunity.

While Information Lifecycle Management remains as elusive as a Unified Theory of Physics, in this little corner of the universe—managing event-log data—SenSage may have pointed out a fresh approach.

Your comments are welcome: jtoigo@toigopartners.com.