The Analytics of Web Data

New information sources demand improved analytics to learn more about customer behavior, employee and supplier relationships, competitors, and the market.

by Philip Howard

As more people than ever access information, buy, sell, comparison shop, and socialize via the Internet, businesses looking to better understand their customers, improve marketing and sales activities, and increase profitability need to incorporate many new streams of data into their analytic environments. How, for example, are Web and mobile marketing campaigns aimed at different demographic groups working? What are the benefits and drawbacks of making a change to an online store or corporate Web site? What types of promotions or incentives will encourage more shopping cart purchases, more game playing, improved brand loyalty, or better supplier relationships? In a Web 2.0 world, businesses require sophisticated analytic capabilities able to handle both the massive volume and diversity of data generated online.

Information derived from the Internet is different in some key ways from data that is generated in-house. To begin with, it is not “here” but “out there” somewhere. Moreover, users don’t necessarily have to access it from a PC or other computer system -- they may access and manipulate it from a mobile phone, an Xbox, or an iPad. This is where things can start to get complicated.

First, you need to be able to load the data in near real-time, since the 24x7 nature of the Web demands up-to-the minute insight. Second, you need to have connectors to all the types of data sources (anything and everything ranging from clickstream analytics and smart phone activity records to Facebook pages and Twitter feeds) that might be required or that may be required in the future. Finally, you need to be able to integrate this data with more conventional in-house sources such as your CRM, order/delivery, and sales force automation systems, as well as external data from aggregators such as Axciom and Omniture.

At Bloor, we recently looked at the technical requirements for analyzing online data and published a white paper (Loading and Analyzing Web Data -- Considerations and Recommendations). Data relevant to social networking and media, online gaming, Web comparison sites, online advertising, and mobile environments (among other sources) requires moving beyond conventional “slice-and-dice” business intelligence. These rich sources of information demand low-latency loading capabilities, sophisticated analytics that can adapt to rapidly changing query requirements, and a high-performance database to host the data, which may be wholly Web-based, or may also include data from internal systems.

Here are some relevant use-case examples:

  • A Web comparison site dealing with financial products needs to leverage customer profiles to more finely tune marketing efforts
  • An online poker destination wants to maximize revenues by analyzing customer activity to encourage additional play; it also wants to avoid losses by quickly identifying potential fraud
  • An advertiser needs to merge click-through data with Web site demographics to determine the best possible match of content to its branding messages
  • A mobile provider wants up-to-the-minute insight into customer behavior in order to serve up location-specific offers

All of these use cases have some key challenges in common. An ability to quickly load, store and analyze a large volume of data is critical. Data from multiple sources must be integrated and transformed into a consistent format. Queries must be easy to set-up, run and change as needed.

A selection of technology vendors has emerged to address the demand for fast and flexible web analytics, including companies such as Infobright and Talend. Although by no means the only possible solutions for storing, analyzing, and loading Web-based data, Infobright Enterprise Edition (IEE), an analytic database, and Talend Open Studio, a data integration solution for ETL, offer promising approaches. Both are likely to appeal to the Web 2.0 community because they are open source vendors (though they are not alone in that) and also because, in the case of Infobright, the warehouse is MySQL compatible.

This is significant because many Web 2.0 companies embraced the open source community when they started out, and, in particular, because many of them opted to use MySQL as their database engine. However, MySQL runs out of steam as an analytic warehouse after a few hundred gigabytes. With the success that Web 2.0 companies are now enjoying, and the associated increase in data volumes, a few hundred gigabytes is not sufficient for most enterprises, and if it is today, then it won’t be tomorrow. In other words, any Web 2.0 organization that has based its business intelligence and analytics on MySQL is going to have to start thinking seriously about moving to another platform, at least for part of its processing.

The world of analytics is changing, especially as the Internet now offers a wealth of insight into customer behavior, employee and supplier relationships, competitors, and the market. New technologies and approaches are required to get the information needed to make better business decisions.

Philip Howard is research director for Bloor Research Ltd. and focuses on data management. He started his career in the computer industry way back in 1973 and has worked as a systems analyst, programmer, and salesperson, as well as in marketing and product management, for a variety of companies including GEC Marconi, GPT, Philips Data Systems, Raytheon, and NCR. You can contact the author at

Must Read Articles