Advanced Analytics Set to Soar
By 2012, fully 85 percent of organizations will be practicing advanced analytics thanks to several converging trends.
Talk to any business intelligence (BI) or data warehousing (DW) vendor for any length of time and at some point they're going to bring up analytics.
We don't mean just plain vanilla analytics, either. BI and DW players are increasingly talking about advanced analytics. Netezza Inc., for example, is prepping a big "advanced analytics" push for next year; at its Partners user conference last month, Teradata Corp. talked up advanced analytics in tandem with analytics powerhouse SAS Institute Inc. Meanwhile, IBM Corp. -- which acquired analytics superstar SPSS Inc. in late July -- this year announced both an analytics-focused services initiative and a "Smart Analytics" black-box appliance. Big Blue and others clearly have analytics on their mind.
According to TDWI Research, the research arm of The Data Warehousing Institute, nearly 40 percent of shops are currently practicing advanced analytics. That's just the tip of the iceberg, however. By 2012, says TDWI research analyst and veteran industry watcher Philip Russom, fully 85 percent of organizations will be practicing advanced analytics.
The reason? Call it a case of multiple, converging trends, Russom explains.
Advanced analytics involves the use of extremely complex (often SQL-driven) queries or predictive analytic technologies. In this respect, Russom and other experts say, it transcends the data warehouse-driven reporting and OLAP practices that delimit the scope of traditional analytics.
"The use of advanced analytics is driven up by organizations' need to understand constantly changing business environments -- as seen in the current recession and the resulting market turmoil -- as well as to discover opportunities for cost reductions and new sales targets," writes Russom in TDWI's Checklist Report: Data Requirements for Advanced Analytics.
"To meet these business goals, organizations are stepping up their use of two forms of advanced analytics: query-based analytics (which relies on complex SQL statements to define recent business events) and predictive analytics (which uses data mining and statistical methods to anticipate future events)."
The rub, Russom stresses, is that advanced analytics isn't a turnkey enterprise.
"Organizations will face challenges as they move into advanced analytics. Many don't understand that reporting and analytics are different practices, often with different data requirements," he writes. "Many have designed a data warehouse to fulfill the requirements of reporting and online analytic processing (OLAP), and they will soon need to expand the warehouse -- or complement it with analytic databases -- to fulfill the data requirements of advanced analytics, whether query-based or predictive."
Practitioners must grapple with several other issues.
For example, Russom explains, although most shops have experience with data integration or data quality, as well as data modeling (the latter of which can make or break the success of any predictive analytic practice), "they don't know how to adjust these data management practices to fit the needs of advanced analytics."
That's why Russom advocates a nine-step approach to advanced analytics. First, he says, would-be practitioners need to identify how (and why) they plan to use advanced analytic technologies. In other words, don't just do advanced analytics for the sake of doing advanced analytics. It sounds like a no-brainer, but in a business and IT culture in which a keeping-up-with-the-herd mentality predominates, it's a legitimate concern. How many shops rushed out to do service-enablement -- or, at least, spent considerable time and energy talking about doing service-enablement -- simply because it was greatly-hyped?
Russom champions the use of advanced analytics to discover existing relationships, anticipate the future, and adapt to change.
These aren't just three common applications of advanced analytics, he stresses: they're three goals that are also clearly linked with ROI and business value.
That being said, he stresses, shops shouldn't expect to pursue these goals on the cheap. "These goals are worth pursuing from a business standpoint, but they require specialized analytic tools and analytic databases from a technology standpoint. This means that organizations new to advanced analytics will need to reach beyond their current reporting and data warehouse infrastructures.
Second, shops must be prepared to scale up their data integration practices to handle large (or extremely large) data volumes. This is why many DI and DW players -- companies including Hewlett-Packard Co. (HP), IBM Corp., Informatica Corp., Netezza Inc., Oracle Corp., SAS Institute Inc., Teradata Corp., and a bevy of analytic database players -- have glommed on to advanced analytics. (Players such as Aster Data Systems Inc., Greenplum Software Inc. and ParAccel Inc., along with Teradata, tout fast-loading options which they claim are designed for Big Data analytic workloads.)
"Many analytic databases regularly begin an analytic cycle with multiple terabytes. hence, whether the data is heading into an EDW or a standalone analytic database, data loading must scale up to handle large data volumes that are loaded very quickly," Russom explains. "Likewise, large data extracts from operational systems must be as non-intrusive as possible."
Third, adopters must learn to distinguish between reporting -- long the mainstay of traditional data warehousing -- and analytics.
"Predictive analytics -- which includes techniques for data mining and forecasting -- is far more exploratory and forward-looking than reporting and OLAP," he writes. "The value of predictive analytics is the discovery of unknown facts and relationships, the confirmation of known or suspected relationships, and the leverage of those relationships for better decision-making."
Predictive analytics differs even from OLAP, which "is usually implemented as a form of parameterized reporting," Russom continues. "In such [OLAP] implementations, the available parameters limit the breadth of the analysis, and the analysis cannot be broadened without technical personnel developing more parameters."
Similarly, adopters must be able to distinguish between data warehouses, data marts, and analytic databases. Shops that have standardized on an enterprise data warehouse (EDW) should be fine, Russom says: "[A]n EDW can handle both query-intense and predictive-scoring workloads, plus it can manage the low-level, detailed data that advanced analytics often requires." Not all shops have an EDW, at least according to Russom's (and TDWI's) understanding. That means they'll have to think seriously about augmenting their existing DW deployments with a dedicated analytic complement.
"[O]rganizations with a warehouse focused on reporting and OLAP will need to extend or complement it with a separate analytic database to support an analytic workload and appropriate data -- if they are to provide the right data in the right condition that advanced analytics requires," he argues.
Russom's offers other common-sense suggestions. For example, he urges, adopters must design a data warehouse architecture that's able to accommodate analytics. This often requires decisions: namely, should analytic data be stored in the EDW itself or in an external analytic "sandbox?" and what advantages -- outside of an ability to more adroitly process analytic data in the DBMS itself -- does the use of in-memory analytic technology confer?
Decisions, decisions, says Russom. Similarly, shops must take the necessary steps to prepare their data for advanced analytics; this involves formatting data such that it can be consumed by a range of analytic technologies, including traditional OLAP tools, query-based analytic tools (chiefly, SQL-driven), and -- of course -- predictive analytic tools.
This last class is perhaps the most challenging, Russom indicates, because it "demand[s] a very specific data structure, typically denormalized." Elsewhere, he adds, predictive analytic tools use "multiple algorithms, each with a unique data requirement … [and] most algorithms are optimized to run fast and accurately with a flat record structure, so data flattening may be required."
It's a lot like a juggling act. After all, in the process of prepping data so it can be consumed by a wide variety of analytic technologies, adopters must be careful to preserve as much detail as possible.
"Even more important [than the size of the data set] are the details within raw source data, because much of the clustering and relationship definitions produced by advanced analytics are based on those details," Russom says.
Similarly, shops should focus on improving data after they work with it -- not before. It sounds paradoxical, Russom concedes, but there's an undeniable logic to it. "[I]mprovements to the data may occur only after business analysts have worked with the analytic data set. These tasks … are risky if done too early, for fear of losing the data details that discovery-oriented analytics depends on."
Finally, Russom urges, adopters should also think about applying the products of their advanced analytic practices to existing -- and notionally separate -- BI and DW activities. "[T]he early discovery phases of advanced analytics … often lead to later phases where the analytics becomes part of daily business intelligence … activities," he concludes. "For instance, a business analyst may mine a data set in an ad hoc manner to understand a new customer behavior, then develop predictive models that are scored on a recurring basis to anticipate the new behavior so it can be acted on appropriately."