Q&A: Applying Semantics to Data Management

Semantics helps bring data management down to the user level.

Every organization struggles with ongoing tensions between what business users want (easy access to data) and what IT wants (adequate control over that data). In this interview, Sean Martin of Cambridge Semantics explains how operational intelligence solutions work to support the best of both worlds. Let users continue to use tools such as Microsoft Excel for flexible, quick prototyping, Martin says, while securely storing a master copy of the data on a server linked to user spreadsheets. "Probably the most exciting thing that we can now do with the semantic technology approach," he says, "is to more practically connect data from any source."

Sean Martin founded Cambridge Semantics in 2007 after spending 15 years at IBM, where he was a founder of the IBM Advanced Internet Technology group. At IBM, Martin was responsible for bringing numerous sporting events live to the Web for the first time, including Wimbledon, the PGA Championship, and the U.S. Open in 1995, and the 1996 Summer Olympics.

BI This Week: You use the term "operational intelligence" to describe your technology. What does that mean? How does it differ from business intelligence?

Sean Martin: Operational intelligence starts where today's business intelligence ends. It is the obvious next step, connecting the insights achieved through a BI dashboard or data analysis, and using those insights as a trigger to drive all manner of business processes.

The key is to make it dead simple for business users to create these connections and even automate data-driven business processes themselves. If users at the operations level can quickly locate and include all the data they need to give them an understanding of some area of the business, then easily turn that data into a visualization to achieve an insight, and then go on to turn that insight into an active data monitor that can trigger a business process -- for example, an exception report or an optimization activity -- then you've created a closed loop.

Now imagine that sort of activity scaled up and across an organization, so that all the operations people can help themselves in order to achieve all kinds of lightweight data-driven business process automation, with just the right amount of IT oversight. The potential for efficiencies is staggering. You've provided your organization with a grassroots means to capture and automate many of the informal or manual business operational insights and experience – knowledge that walks out the door every night.

The sort of system I've described can also easily be linked to the formal IT function, thus providing IT with a means to do far more with existing resources.

How real-time is operational intelligence data? Does the technology address unstructured as well as structured data?

It all depends on how the data is made available, but operational intelligence can be entirely real time if the organization's environment allows it –--and if the situation calls for it. Event-driven, service-oriented architectures (SOA) have been enabled by enterprise service bus (ESB) technologies for a while now. What is new is the ability to give that immediate power to the user.

Addressing unstructured data is vital. Probably the most exciting thing that we can now do with the semantic technology approach is to more practically connect together data from any source. This includes structured data from databases and information services, semi-structured data from spreadsheets, and now totally unstructured data mined from Web sources, e-mail, or documents. The operational intelligence capability is finding great application in customer intelligence and compliance solutions, but there are many other useful applications being built with it.

Speaking of the semantic technology approach, can you explain the basics of that technology?

It's complex, but in short, semantic Web technologies have their roots in the existing World Wide Web invented by Sir Tim Berners-Lee. He believed that we also needed a parallel data Web that would allow computers to assist us far more than they currently can in order to enable wide area information access and integration tasks.

Well over a decade ago, work began at the World Wide Web Consortium (W3C) to provide a public standards-based foundation for linking together disparate sources of data into an intelligent information fabric. Designed to handle the scale and diversity of the Web, semantic Web technologies are starting to provide enterprises with a revolutionary flexibility.

The concept of a semantic Web has been recognized by Gartner as a "top ten" disruptive technology, and organizations ranging from Facebook to Best Buy to Merck to the U.S. Department of Defense are embracing semantic Web technologies to tackle their most pressing information challenges.

What makes semantic technologies a good solution for data control and access?

Data spread across the enterprise in Excel spreadsheets can be a problem; data managers lose control, there's no single version of correct data, and so forth. How does operational intelligence address that? It's a big problem for most organizations.

Homegrown operational intelligence solutions almost invariably include an Excel component, whether for collecting and sharing data with colleagues and partners informally via e-mail, or for business and operations personnel integrating data in an ad hoc manner, or for the unbeatable flexibility in creating analytic models. For all of these, Excel is the business user's software tool of choice. Its low entry cost, and the fact that users can get started without having to wait for IT, have ensured its ubiquity.

IT groups, on the other hand, look for every opportunity to replace spreadsheet-centric systems with purpose-built BI and other analytic applications. They do that both to avoid multiple versions of data across the enterprise and to put themselves in a position to be able to properly support what often ends up being mission-critical applications. IT's problem, of course, is to find the resources and time to tackle every problem for which a homegrown solution has sprung up -- it's often called the "shadow IT" problem. Resources are always constrained, so IT groups can generally afford to take on only the most serious or highest-value problems.

That's why we have ongoing tension between what IT wants and what business users want. Operational intelligence solutions work to support the best of both worlds. Let users continue to use Excel for what is does well -- providing the flexibility and the quick iterative prototyping business users need. At the same time, securely store a single master copy of the data on a server linked to those user spreadsheets, and the Web views that use them.

This server-side master copy of the data can be managed by IT, who can then back it up, validate it, restrict who can access it, and track what uses are being made of it. The problem of multiple disparate copies of the data disappears. The Excel worksheets become a portal to the linked data rather than out-of-control mini-silos for reference data.

This kind of technology is also ideal for organizations wanting to wean business users off an Excel-based shadow IT solution. It provides a simple way to migrate data off worksheets and into more formal purpose-built IT solutions.

Self-service BI is a great idea -- getting tools and data into the hands of a wide range of users -- but it has proven to be tough to execute. What are the problems with deeper user adoption, and how do you propose addressing them?

As you say, use of analytics tends to be more of the top of a pyramid in most organizations, where either relatively small numbers of executives are examining periodic reports or the analysts with special skills who are assisting them are the main constituents served by current BI solutions.

There are a number of reasons for this, but I believe the main one is that data is often very hard to integrate and access. Most business users need a specialist to help them pull together a BI report. Until now, data has been stored in forms and models that are convenient and efficient for computers to access, not people. For example, look at the SQL query language. Its model for access is totally non-intuitive to most people, creating an enormous barrier between the user and the data they need. Even if your business user knows SQL, there's the issue of understanding what the data actually means, then picking out the needed data from all sorts of obscurely named tables and columns. No wonder most users need an expensive solution that can translate data from how it is stored to something that they can understand.

This is the same reason users often prefer spreadsheets. However, the latest semantic technologies are turning this problem on its head by providing storage or access models and data descriptions that reflect how domain experts or business users think about their own information.

If users are able to locate, query, and visualize their data far more easily, they are suddenly in control. This, in turn, enables lower levels in the organizational pyramid to be empowered by direct access to the organization's data, and to use it far more than was previously practical. Data ends up driving day-to-day operational decisions in the organization, which is the point in the first place, of course. We call these new uses of data Operational Intelligence.

How does Cambridge Semantics address what we've discussed here?

Cambridge Semantics provides semantic data management solutions for the enterprise through our Anzo software product. We provide both business and IT users simplified access to, and manipulation of, their information. When it comes to access, Anzo can be used to securely pull together, store, and share structured data from databases or information services, semi-structured data from spreadsheets, and unstructured data from Web pages, e-mail, and documents.

When it comes to manipulating data, Anzo allows users to leverage a sophisticated plug-in to Microsoft Excel, to workflow and production rules engines, and to Web-based tools. All of this helps users quickly locate, slice, dice, collect, monitor, and graphically visualize data. The software is powered by the W3C semantic open data standards, ensuring rapid data integration and maximum reuse. The result is that our customers can quickly combine diverse data that crosses organizational departments in order to make better decisions and improve process efficiencies.

Must Read Articles