Big Data or Big Boondoggle?

Is Big Data everything it’s cracked up to be or is the Big Data value proposition based on fears of an undiscovered insight in our data? Our storage analyst, Jon Toigo, looks at why Big Data isn’t for everyone.

Like many of my political stripe, I suppose, I was more or less hoping that the combination of the recessionary economy and rising fuel prices would engender greater sanity when it came to buying SUVs. Here in Central and South Florida, where our biggest mountain scales to roughly 30 feet above sea level, the appearance of a gas-guzzling SUV in the traffic lane ahead of me seems like overkill.

Although there are some legitimate uses for these behemoths -- hauling a lot of kids to soccer practice or perhaps towing a big boat on the weekend -- that might justify their purchase, I usually see them piloted by one person driving alone who seems completely detached from the practical matters of cost or carbon while texting on his smartphone. Moreover, the vehicle is usually black (a strange choice given our hot and humid climate), usually factory pristine inside and out (no evidence of either rugged off-road activity or kid damage), and usually not equipped with a trailer hitch (suggesting no hauling or towing use). My conclusion: this is just a car used to haul its owner back and forth to work every day.

I can't help but be a bit judgmental when I read with the recovery’s onset, the top selling Detroit auto was the SUV. Some models are gone (like Hummers), and later iterations of that vehicle were just Chevy Tahoes with a different body style -- for which consumers paid an extra $40K. Really? For what?

I suppose that's why I haven't become an automotive tycoon. I haven't mastered the art of selling what folks want to buy, whether they need it or not.

The same goes for my view of technology fads. I wonder why folks are spending so much money on server virtualization to consolidate file servers, something you can do readily (and at far less expense) using a traditional file system.

I wonder why IDC's chart on worldwide external storage deployments in late 2011 shows 21 exabytes of disk arrays, with half of the disks being used to back up the data on the other half. The "tape is dead, use disk for everything" mantra is absurd when you think of the energy consumed to hold on to a lot of data that no one ever accesses, not to mention the failure rate of disk, the questionable efficacy of WAN-based replication of data stored on de-duplicating virtual tape appliances, and the 100x mark-up on commodity components represented by most value-add arrays.

Don't get me started about "clouds" -- the latest iteration of an outsourcing meme that we called service bureau computing in the 1980s and application service providers in the 1990s.

Now, a new technology discussion, focusing on the airy idea of Big Data, has started to churn my stomach. Don't get me wrong: there are times when we want to use massive amounts of data to discover patterns and relationships. Jeff Jonas at IBM is a creative and humorous man, very smart and razor-focused on Big Data. His groundbreaking work in this field shows what can be done to tackle a problem -- such as detecting a few instances of voter fraud hidden in a giant haystack of voter registrations.

Of course, I understand the value described by a fellow at the National Security Agency who talked about the need to surface multiple databases all at once to find terrorists. Tracking these people may require concurrent access to airline reservation systems, passport control databases, vehicle rental records, sales records of potential explosive device components, and the like. The search might leverage several miles of disk drives to present data for analysis, though Jonas told me a while back that he prefers to surface all of his data on Flash SSDs. Both are, I believe, good illustrations of the possibilities of Big Data technology.

These are complex problems, and good candidates for Big Data approaches. I get that. They are also problems that we seek to resolve by asking very specific questions of our data. Each of these Big Data analysis applications has a point, a thesis, or (at the very least) a well-defined hypothesis to test. Each has generally identified what data sets to include and cross reference. In most cases, the data itself has been "normalized" to some degree to enable such cross-tabbing (though Jonas doesn't view such normalization as especially necessary or wise in some cases). What each example is not is a rudderless adventure into an unknown sea of data undertaken just to see what comes up; instead, each is a structured undertaking with a mission.

What gives me pause is the latest marketing from major enterprise vendors that suggest that Big Data is some sort of killer app for which an "everybody who is anybody" approach is better for building a massive scale-out storage infrastructure to support in the future. Simply put, I am not seeing the use cases that would justify the expense associated with building a Big Data platform inside most companies today. Absent some sort of real world, return-on-investment-oriented, value case, I wonder why any company would sanction the investment in all of the spinning rust or Flash SSD capacity to host this research capability.

Today, economic and political reality is in flux, creating a baseline level of worry for every businessperson I know. The economy is increasingly influenced by global realities beyond our control. One financial services firm is running ads that tell us that our success will be determined by our ability to understand the relationship between saw grass in South America and rice production in South East Asia so we can understand aircraft sales in Europe and the price of gasoline in the U.S. Without the omniscience of their analysis, the local grocery store will no longer be able to compete with the grocer that just opened up down the street.

This is followed by another commercial ad by a technology firm stating that their storage and server gear is required to analyze the purchasing patterns of consumers within a 30-mile radius in order for Joe's Bakery to increase sales of a particular variety of cupcake on Wednesdays. Without Big Data, Joe may be less profitable and the local community a few pounds less overweight.

It seems that a lot of the Big Data value proposition is preying on fears that there is an undiscovered country in our data. There may very well be, but is spending the money, time, and effort on building a capability to find out actually worth a few more cupcake orders?

I realize that the storage industry is a bit down on its luck. Last year, I was advised by an analyst friend to stop beating up on the storage vendors because their profits worldwide had fallen from about $30B to a paltry $27B. I am reasonably sure that the future profitability of the storage world would be better assured by selling what consumers really need rather than hyping a killer app to get them to purchase wares that may deliver no discernible business value in the future.

What I would like to see is the appearance of analytical Big Data service providers that can be hired as needed to pursue well-defined projects that interrogate our data. I would rather buy this capability than build it, given a questionable return on the massive investment that constructing and maintaining such a capability on premise would entail.

But what do I know? I don't drive a black SUV here in Florida.

Your comments are welcome:

comments powered by Disqus