In-Depth

Q&A: Managing Exploding Data Volumes

Stephen Brobst says Teradata’s vision of “extreme data warehousing” is about a lot more than just an attention-grabbing marketing term

For almost as long as data warehouses have been posing intractable growth problems, NCR Corp. subsidiary Teradata has been the recognized leader in the large volume—that is, high-end—data warehousing space. These days, Teradata remains the undisputed champ in high-end data warehousing, but NCR’s most profitable subsidiary faces renewed competition from a new breed of competitor—the data warehousing appliance vendors—as well as from IBM Corp. and Oracle Corp. in the high-end.

We spoke with Stephen Brobst, chief technology officer of Teradata, about contemporary trends in data warehousing. In spite of its NCR hardware underpinnings, Brobst says Teradata’s Warehouse is much more than just a data warehousing appliance, and downplays the likelihood of a Teradata-branded appliance at any point in the near future. Moreover, he thinks there’s plenty of upside to the data warehousing market, with data volumes predicted to balloon into the stratosphere over the next few years.

That, says Brobst, is why Teradata’s conception of “extreme data warehousing” is a lot more than just a marketing term.

You’ve stressed that, unlike Netezza and DATAllegro, you’re not a data warehousing appliance vendor. Now to my mind, there’s no controversy here—you aren’t, strictly speaking, an appliance company. Is there a reason you want to make this distinction clear?

Netezza in the marketplace is really positioned as a data mart appliance. I guess that’s the word they tend to use. [T]hey take a specific analytic application and they load up the data for that application, but it’s really a fairly special-purpose deployment.

I would not put us in the category of an appliance; [Teradata Warehouse is] really a general-purpose enterprise solution. We don’t focus on data mart deployment, but rather enterprise information management. But they [Netezza] seem to want to associate themselves with us, as in that they compete with us—maybe because they think it helps make them look better.

You say that you’re more of a “general purpose” solution—but what do you mean by that? As far as I know, they [Netezza, DATAllegro] might do well in specific verticals, but they’re not marketing industry-specific versions of their products.

I mean that we’re more of a general-purpose data warehouse, where their focus is more on data marts for specific applications. If I am a large health care organization, I will use [Teradata] for fraud detection, I will use it for financial analytics, all using that same single source of truth.

Netezza is really focused on pulling off data for a specific analytic application and structuring the data for that application, and they’ve got a really aggressively priced solution, or appliance, for these kinds of niche applications. The issue is that if you have a whole bunch of these applications, you end up buying a whole bunch of appliances. Our solution is to really store the data for all of these applications.

That said, have you ever thought of marketing a line of Teradata-branded data warehousing appliances? I mean, you’re halfway there, as it is, with your NCR hardware underpinnings.

We have looked at marketing Teradata-branded appliances. The thing that happens if you go out with a solution that’s a data mart, then you end up encouraging people to make these kinds of point decisions which look attractive at a point in time, but the long run TCO is actually worse, because you end up having to replicate the data for each new philosophy.

We put the R&D and the design in to [Teradata Warehouse] to make it scalable. In the Netezza model, their trick, if you will, their secret sauce, is some innovation that they’ve done in the I/O controller, so it’s very much a hardware-based implementation technique, and we really believe that’s not really a sustainable business model.

So who’s your biggest competitor, if not Netezza? Oracle?

To the contrary, we don’t even see [Oracle] at the high-end. It’s IBM at the high-end. Those last 18 to 24 months, we have consolidated hundreds of Oracle data marts into enterprise data warehouses, so they’re actually our prey.

Rightly or wrongly, there’s a perception in the market that Teradata is a high-end play with a high-end price tag. Have you made any effort to market more effectively to the mid-market or to the small and medium enterprise?

We certainly are interested in the mid-market, but the small enterprise, that’s not really our focus. In reality, most of our customers are less than a terabyte, actually, so our pricing strategy [is that] we price right on top of Sun and Oracle. For acquisition price, we’re priced better than IBM, and we’re priced right on top of, at the same level of, Oracle. Our goal is not to beat Oracle at that price, but to beat them in terms of feature parity and lower total cost of ownership.

I wanted to talk with you about the high-end of the high-end data warehousing space itself, because it seems that every year, the goalposts move again. Three years ago, 1 TB might have been the high-end entry point, but my sense is that it’s much greater today.

It depends on what you mean by “high end.” A lot of times people mention the entry point in terms of how much storage, so high-end, typically, in our case, [with] more than 50 percent of our customers start at less than 1 TB … would have to be over 1 TB. [W]e [have] some customers who are considered high-end even at a few hundred gigabytes, but they’re doing really sophisticated things. If you take somebody like Travelocity, they’re not the biggest customer on the block, but they’re doing some very exciting things in terms of analytics.

One of our competitors likes to trumpet the example of a European telco customer that has 20-some TB in its data warehouse, but when you look at what they’re doing [with this data], you probably wouldn’t call it high-end. Basically, you’re only allowed to run queries that specify the phone number and the data of the entries you’re looking for. Yeah, I built this big monster database, but if they can’t run lots of interesting queries, then who cares?

I think you’ve posted 12 consecutive quarters, or something to that effect, of revenue growth, which is really something, seeing as how we just shrugged off an economic downturn about 18 months ago. What’s your take on the health of the data warehousing market, and do you think that you, and perhaps some of your competitors, will continue to see such encouraging growth?

I’ll have to check on those numbers, the 12 consecutive quarters, because I don’t know about that, it sounds right, though. We’ve had five consecutive quarter of record growth, five quarters of everything being better than the previous quarter. We’re the growth engine inside of NCR.

But how do I view the market? I think the market is coming to us. In the next seven years, enterprises will be managing 30 times the data that they are managing today. In another report that was put out by Berkeley, they looked at the data growth and demonstrated that there will be more data created in the next two years than in the previous 40,000. This means that what people used to think of as a big data warehouse, today that is something very small. Part of it is driven by the economics—we’re driving more and more price/performance from tracking Moore’s law with the technology. Every year, we’re delivering 20 percent better performance to the marketplace.

Now I have a question about this, because Eric Rogge over at Ventana [Research] suggests that improvements in processing power, memory speeds, and storage performance aren’t keeping pace with the explosion of data growth. Do you think this is potentially a problem?

If you believe the Berkeley numbers, which are every 18 months roughly doubling in data, and you believe Moore’s law, which is roughly every 18 months roughly doubling in processing speed, then there isn’t a problem. Recently, storage capacities haven’t been doubling at about the same pace. Two years ago, there was a dramatic change in the storage market where they started driving more densities into storage—even though the reality was that it was happening before that.

I’ve heard you use the term “extreme data warehousing” to differentiate your strategy from your competitors. Now this smacks of marketing-speak, so I wondered if you could talk a bit about the nuts and bolts of this vision?

What we mean by extreme data warehousing is taking data warehousing an order of magnitude beyond where it is today. If you look at data volumes, we predict that what today people consider to be the detail data will no longer be the detail data in the future. If you are in the telecommunications business, the detail data is typically detail records, one record for each call your customer makes. Ten years ago, if you told people you were going to put detail data in your data warehouse, they’d have said, “You’re out of your mind!” The storage costs, the processing, the memory—it all will be prohibitively expensive.

We believe the detail data will not be the call records. It will be sub-atomic, down to the packet level, the actual packets over the network. If you look at FedEx, the details used to be the package, but now, if I send a package from LA to New York, it’s going to go through 12 different scans on the way. So the detail is no longer the package—it’s all of those scans.

So when we talk about extreme data warehousing, we talk about extreme volumes of data, we talk about extreme volumes of users.

Must Read Articles