Q&A: Analytics, Unstructured Data, and the Growth of Social Media

What are the common misperceptions about unstructured data are explored, and how can this data be exploited for game-changing insights?

Where are companies making good use of predictive analytics -- especially with unstructured data? What are the most common misperceptions about unstructured data, and what role does social media play? For help understanding all the ramifications of exploring big data for insights, we turned to Alex Black, senior partner and head of strategy for CSC's Enterprise Intelligence Practice.

BI This Week: With all the emphasis on predictive analytics these days, where do you see companies making the most progress and why?

Alex Black: While predictive analytics has been much talked about, it has yet to become prevalent among most companies. A 2010 Economist Intelligence Unit survey found that less than 33 percent of companies are using predictive analytics, but 70 percent said they will in the next three years. We'll continue to see great progress in this area over the next year.

Right now we're seeing consumer product and services companies leading in behavioral models such as propensity and clustering. The leaders are combining value models to further identify their best prospects. For example, insurance companies have always had actuarial tables and continue to refine them, but a new emphasis has emerged focusing on customer value models such as lifetime value (LTV) that put the focus on the overall relationship. We've also seen B2B companies start to use lifetime value models, especially applied to prospective customers to determine the most valuable leads.

Sticking with analytics, how are companies addressing the issue of centralized vs. decentralized analytics organizations in light of efforts to focus more on the entire enterprise?

The mass convergence of knowledge workers, torrents of data, and high-powered tools is leading to decentralized development and use of analytics throughout organizations. This places stress on IT organizations that favor standardized tools and systems for cost and efficiency reasons. In the past, the most advanced analytics firms tended to have centralized capabilities where they could standardize processes, tools, and data strategy, but now these firms are being forced to put analytic horsepower closer to the point of decision making.

In addition, the rise of operational analytics -- for example, machine-to-machine information exchange such as smart meters and heavy equipment continuously streaming operational and maintenance data -- has dictated that decentralized nodes of analytic expertise with centralized support is the new hybrid model for success.

What are the most common sources and what are some of the most common misconceptions around unstructured data?

Unstructured data has emerged as a huge topic lately, largely driven by the reams of data flowing through the Web, but unstructured data has been around a long time -- just not at the current volume levels we are seeing today.

Inside the firewalls of the company, examples of unstructured data include e-mail messages from customers, e-mail messages from employees, voice recordings of customers, and free-form comments written in customer-facing systems. These data types have historically been difficult to harvest for insights, but lately more companies are using text mining and semantic analysis tools to analyze these data sources.

The Web poses new challenges in terms of the sheer volume of data that can be analyzed, which is where the new emphasis on unstructured data has really caught the attention of organizations. Search engines help find results but don't necessarily develop insights from all of the Web traffic where a company's name or products are being discussed, reviewed, or (even worse) skewered. There are emerging tools that can handle the semantic analysis as well as the extraordinary volumes, and this is where I get excited in terms of the insights we are now able to develop and apply to our customer and product strategies.

What are some examples of companies mining unstructured data for insights that are game-changing?

One of the most well-known examples happened a couple of years ago where Proctor and Gamble monitored Facebook and squelched a viral campaign against Pampers by a group of mothers. As a result of their monitoring, they were able to address their issues head-on and subdue the campaign. Of course, there are the known unknowns -- cases of security threats that have been thwarted by mining huge amounts of unstructured data.

Another example I am familiar with is a company that mined unstructured data on the Web for product insights; by using sentiment analysis, it was able to redesign a packaging flaw that would have gone undetected for a much longer period had traditional focus groups and consumer panels been used for feedback.

There is a corollary risk worth mentioning: just because someone has a negative impression of your product or service and expresses themselves on the Web doesn't necessarily require a response or concern. That's why sentiment analysis over a broad array of unstructured data is so valuable -- you need to look for patterns as opposed to the single "determined detractor."

How is the emphasis on social media changing the dynamics of how IT works with the business to acquire, access, and analyze data?

This is a fascinating area of development. Similar to when the dot-com explosion occurred (not the collapse but the early boom period), the business folks have gone outside the ropes of IT to contract directly with social media, digital marketing, sentiment analysis, and Web analytics firms because they thought IT was too slow moving on evaluating and recommending these technologies for implementation. The fear of competitive disadvantage was the great motivator and continues to be today.

IT has been slow to react because they have such a huge investment in infrastructure that is not ready for big data and analytics. For instance, leading firms have evolved their infrastructure to include BI appliances, but this has generally been a slow process. I think IT has to continually test the new technologies emerging in this area and then be ready for the impending market consolidation.

In an environment of tight cost controls, where would you recommend investments be made in the areas of big data and analytics? Where do you think companies will get the biggest returns on their investments?

I think companies need to be thinking about search-based applications as the next wave of intelligence generation. Content analytics, information mash-ups, and enterprise search present tremendous opportunities for streamlining, getting to answers, and ultimately to making better business decisions.