Q&A: The Biggest Big Data Trends BI Professionals Need to Understand
Big data is changing the competitive dynamics of the BI industry. We explore how everything from sensor data to social media to mobile BI is changing the very nature of big data.
Big data is growing even bigger, and not just from the traditional increase in data transactions you’re familiar with. Everything from sensor data to social media and mobile BI is changing the very nature of big data.
To learn more about current big data trends and what BI professionals should be paying attention to, we spoke to Marc Demarest, the CEO and a principal in Noumenal, Inc., an international management consulting firm based in the Pacific Northwest and the UK. Marc is presenting the keynote address, Big Data through the Looking Glass, with Mark Madsen at the TDWI World Conference in San Diego, July 29 - August 3, 2012.
BI This Week: There have been more than a few commentators who've suggested there's some "irrational exuberance" associated with big data? Do you think that's the case? Where? Why?
Marc Demarest: I certainly think the supply side of the market is, in aggregate, making unqualified claims about particular technologies, the breadth of their applicability, and more generally about how the big data revolution undoes everything we think we know about data warehousing, but I don’t sense any irrational exuberance on the demand side of the market. Rather, just the reverse is true. IT organizations in many cases seem to be hesitant to accept that the revolution is real, and that its impact will be broad-based and nearly ubiquitous.
I attribute that hesitation -- perhaps wrongly -- to a number of factors: the “conventional” BI backlog many organizations have to deal with, the obvious overpositioning of the supply side, the “explore and you’ll find something” idea, and the lack of exemplary case studies for big data outside social media segments and usage models.
Mostly, though, I think the coupling of big data with social media has done a disservice to the quality of the discussion about the impact of big data on our existing architectures, methods, and practices, for the simple reason that I doubt very much whether social media data has broad applicability outside vertical markets like consumer packaged goods and brand-centric consumer durables.
What kinds of businesses and markets benefit most immediately from big data initiatives?
I have trouble finding a market segment where big data and big analytics aren’t applicable, to be frank. There are the obvious suspects -- CPG, telco, markets where consumer brand reputation really matters from a revenue or market valuation perspective -- but elsewhere it’s also relatively easy to find opportunities to exploit big data. Most companies with important voice response systems -- financial services, airlines, companies with complex post-sales support models -- are, in general, missing opportunities to mine electronic transcripts of their call traffic.
Many organizations that could materially benefit from integrating public records data -- well beyond what they may get from credit bureaus or background screening operations -- are not making use of that big data, either because they don’t know it exists or because the public records data suppliers have a short-sighted “by the drink” pricing model that no large-scale consumer has pushed back on, yet. Any organization that’s deploying branded handheld applications to iOS and Android -- I have a dozen on my phone -- is able, now, to make use of geolocation data from those apps, to understand a lot more about how, where, and why those apps are used, but I don’t see companies taking advantage of those “dwell graph analysis” opportunities the way they should.
You've made a lot of noise about "the sensor revolution" as part of the Big Data Revolution. Why do you think that's important to the typical IT organization's big data strategy?
From my vantage point, sensor-based data is rising as an object of interest in every vertical market I can think of. Any organization that extracts value from natural resources, has a waste stream of any kind, is involved in a complex supply or delivery chain, makes a product that requires anything that looks like “field service,” is implicated in public safety or personal identity, or markets and sells to consumers has the opportunity to make use of a rising sea of sensor-related data in their BI environment today.
Outside of some “early adopter” markets such as natural resources, manufacturing, agriculture, healthcare, transportation/logistics, and retail, that aggressive push into sensor data analysis isn’t happening, and I think that’s because of a small number of related factors.
First, I suspect most BI teams are ignorant of the role sensing and actuating technologies are playing in the lives of their employers; sensor analysis work is being done outside the control of the BI team because it’s not seen as a “BI problem.” That’s a strategic mistake for BI teams, like all ignored “shadow IT” projects are, but that will change over the next 24 months.
Second, the vast majority of BI initiatives I see are wholly insular: my data, my consumers. The BI team is instrumenting its own machine, rather than reaching outside, for market signaling data of various sorts. BI has historically been insular, and uninterested in market signals, outside the verticals for which market signals are the most important data sets.
Third, sensing-and-actuating technology is foreign to BI folks; these sensors are often complexes of hardware, chemicals, even (in the case of manufactured materials) nanotechnology and microfluidic technology -- not domains we’re traditionally involved or familiar with.
Finally, there’s a fairly large BI project backlog in the industry that we’re all aware of and no one likes to talk about. It’s hard to bite off a new and fundamentally dissimilar data source, with new and dissimilar analytics associated with it, when we’re still having trouble getting the corporate performance dashboard stable and functional.
Will big data change the competitive dynamics in the BI industry, or are we going to get our big data technology from the same set of suppliers dominating the BI market today?
There are definitely two distinct schools of thought here. Some people -- and I am one of them -- believe that the franchise players we traditionally use today (Oracle, Microsoft, IBM, SAP, etc.) will do what is necessary to maintain their franchise, up to and including acquiring promising best-of-breed big data players. Take, for example, the columnar data store: SAP has one in Sybase I/Q (and a different acceleration story with HANA, but never mind that); Microsoft has added columnar store technology to SQLServer 2012; Oracle offers columnar store technology.
What was a best-of-breed appliance technology play is now standard fare from the franchise technology suppliers.
The other camp points out -- and it’s a good point -- that the big data technologies are entering the market with an economic model (price/margin model) that the franchise players cannot stomach. Of the major franchise RDBMS players, only Microsoft’s revenue stream is likely to grow appreciably (with SQLServer 2012), and that will be largely at the expense of Oracle. For SQLServer 2012, there are rumors that its pricing will be similar to Oracle’s per-core pricing model, which would mean Microsoft is on the hunt for margin as well, and therefore equally vulnerable to replacement by a new set of technologies that are driven by a different price/margin model. It’s a hard one to forecast, I think.
On the one hand, the franchise players have absorbed every major innovation in the BI market, to date; on the other hand, they are clearly margin junkies who can’t imagine selling an enterprisewide license of anything for a (relatively speaking) few thousand dollars, given the expense of their sales, service, and G&A.
What's your perspective on the "no up-front design required" position many big data vendors take?
Up-front design -- in the sense of pre-defined schema into which we engineer source data -- moves the cost of data legibility into the ETL/ELT stream. Data pooling models like those associated with Hadoop shift data legibility costs into the analytical stream: the data’s all there, and it was cheap to persist it, but the analytical code (regardless of what it’s written in) has to shoulder the burden of normalization, commensurability, comparability -- all the things that classic schema design does.
You don’t avoid these problems; you can’t avoid them. All you can do is choose where you pay for them. When you’re working with (for example) the sensor streams from a 100-bed hospital, and the environmental data (temperature, ambient air flow, air pressure, etc.) come in according to a different clock tick frequency than patient data (blood pressure, pulse, etc.), and the medical personnel sensor data (location, dwell time, etc.) come in at yet another, different tick frequency, you have a commensurability problem, period.
Your choices are what they have always been: (a) fix it in the “source system” (normalize all the sensor networks’ reporting frequency); (b) fix it on load (and if you do that, why write the data to files unless your RDBMS can’t handle the data volumes?); or (c) fix it at analysis time (and deal with code complexity). Other factors -- like the need to analyze, dispatch, or route events in real time -- may necessitate solving the commensurability problem after you persist the data. In that sense, “no up-front design required” is accurate enough. We just need to be sure we don’t misread the claim as “no design required,” which would be inaccurate.
Is social media data -- Twitter streams and the like -- useful for most organizations?
Not in my opinion. Social media data is a computational linguistics problem, at this point -- and that problem is made worse by the foreshortened ways in which people communicate when the medium limits the size of communication “chunks.” I suppose there’s some benefit to monitoring rising and declining trends for companies in certain markets, but that work is done (probably better than most can do it) by companies like Lithium; one is better off in most cases subscribing to analytical services rather than processing the data directly.
There are exceptions -- the richer communications in social CRM environments such as GetSatisfaction, for example, where one can actually analyze gestural data, see how often and under what circumstances users of products or services are reporting problems or suggesting enhancements, and monitoring how effective a self-supporting community is. However, I’m not aware that those sorts of social media environments make access to their raw data easy or convenient, and they are probably -- when all is said and done -- in a better position to provide us with reports and analytics than we are to define and implement those reports and analytics for ourselves.
Is there some necessary connection between mobile BI and big data or are those two trends unrelated?
The two are related, but perhaps not in the way people usually imagine. Most mobile BI platforms are sensor arrays. They are sources of data, and potentially rich sources of data, particularly geolocation data (longitude, latitude, altitude, speed, and direction). They’re also sinks for BI data: sinks with haptic interfaces and small screens that can be geo-located and therefore provisioned with data that matters “at the moment.” I think mobile platforms get a lot of attention as sinks for BI data; I am not sure that as much attention is paid, practically speaking, to those platforms as geolocate-able devices, in the sense that (a) the mobile device’s context could serve as a cue for what sort of analytical data would be interesting to the user or (b) the mobile device’s context and location is, in and of itself, interesting for analytical purposes.
What big data company is personally most interesting to you at the moment?
Splunk is very interesting to me. They say “machine data” and I hear “sensor data,” perhaps wrongly. They talk about computing and network equipment, but I hear “sensor arrays.” In any case, I see Splunk as being a test of the hypothesis that the new breed of big data vendors are creating insurmountable economic barriers for the franchise players. If Splunk can continue to grow well, remain independent, and build its own ecosystem, that’ll be a pretty clear sign that the competitive dynamics are changing.
I watch MicroStrategy fairly closely. They’re on to something with their Visual Insight offering.
At present, I’m very excited about Microsoft SQLServer 2012. It’s enterprise-scalable and in the right configuration, very high-performance in query-intensive environments. Organizations that have taken “SQLServer can’t scale” to the bank, architecturally, will be in for a significant surprise.
I’m waiting patiently for big data consulting firms that aren’t technologically biased to emerge from the mists.