Q&A: Data Warehousing and the Appliance Model

Datupia's Foster Hinshaw discusses the rapidly evolving state of the data warehousing market and the benefits and drawbacks of the appliance model

Nearly eight years ago, Foster Hinshaw co-founded Netezza Inc., the company that helped make the data warehousing appliance a household name in data warehousing circles. Hinshaw and Netezza succeeded with an appliance model that -- thanks to the efforts of predecessors RedBrick and WhiteCross, along with Teradata -- while not new, was new enough to upset the status quo. Last year, Hinshaw again rattled the status quo, announcing a new company -- Dataupia -- and a new spin on the DW appliance: abstraction.

Unlike existing appliances, Dataupia plugs right into or sights right underneath an organization's existing RDBMS assets. As far as a DBA or data warehouse architect is concerned, Dataupia claims, it isn't even there.

We caught up with Hinshaw to talk about Dataupia's growth, the rapidly evolving state of the data warehousing market, and the pluses and minuses of the appliance model. We also sought an answer to a long-simmering question: is the DW appliance an application-specific data mart or isn't it?

TDWI: In the data warehousing appliance market, as in just about everywhere else, the proof is in the pudding. Dataupia officially launched last May, so you've been live for about nine months now. What kind of customer traction can you point to? I know that with some of your competitors -- or with even with Netezza, in your case -- it took awhile to ramp up to actually touting customer testimonials. Is that the case here, too?

Hinshaw: We have a number of paying customers, which is wonderful. We have a lot of traction in the space, so the combination makes me feel good about where we are. We also have a number of other [customers] that are in various stages.

We have one customer [editor's note: Foster asked that we not reveal its name] where the users are able to do queries they never could have before. We give them more depth. They don't have to rely only on aggregates. They get all the detail records. Second, we're storing seven years' worth of data, so that means that the users have a heckuva lot more data in order to either look at the customers in general to do analytics or to look at specific customers for analyzing churn.

Seven years of data. We used to talk about maybe looking at a few months or a year or two, at most, of historical data, but now you're talking -- in this case -- about a lot of historical data? Is it in the warehouse and can users run queries against it in real-time?

Yes. That's just what they're doing, in fact.

Is that an advantage of the data warehouse appliance model, this ability to -- I assume inexpensively, or comparatively inexpensively -- host several years of data and make it available for rapid querying by users? Or is that more kind of an evolving status quo -- sort of where data warehousing itself is heading?

I think it's absolutely an advantage [of the data warehouse appliance]. Because of the affordability of our solution, it changes some of the things that you can do with your business and allows you to get more granular details, maybe more toward that one-to-one understanding of your customers. So you can tell historically over the last several holiday seasons what they've done for each holiday season, and that enables you to do really what you couldn't do in the past.

As the users get on to the system, they're doing types of queries [and] types of analytics they never thought about doing before. That's what I love. When you see customers trying to do stuff and they say "Wow!"

We had a customer with an analytic query that took them two weeks to get results back. We got that down to -- in the press release, we said two hours, but it was actually 10 minutes. That felt a little braggy, so we said two hours, but in reality it was 10 minutes.

I've heard grumbling that customers aren't too keen about having still another database system to manage. They'd much rather make do with the database systems they already have. The appliance model means that they have another RDBMS that they have to plug into the mix. Now I know your architecture is a little different -- you promise to plug right into Oracle or DB2 or SQL Server -- but I'm wondering if this has been a problem for you, too.

Well, that's just it: We run under Oracle, DB2, SQL Server. Actually, we run on top of Oracle, DB2, and SQL Server.

So you kind of abstract your appliance, so to speak? You're sort of a supercharger for Oracle, DB2, or SQL Server?

Personally, I don't like the word supercharger. With Dataupia, it really is Oracle on massively parallel architecture [MPP], rather than Oracle on SMP.

Everyone knows that when you get to large data, you need MPP. Oracle the company doesn't offer that natively on MPP, so what we've really done is enabled Oracle to run on MPP or on top of an MPP architecture. It's an interesting distinction, because … we figured out a way to get Oracle to run on top of us, they're not running on us, but on top of us. That's not just a technical distinction.

It's not just Oracle, of course, right? It's the same with DB2 and with SQL Server, too?

Yes, that's right. We have a connector for each of those, and we connect into the federation layer. All of these vendors have developed federation layers really during the past five or 10 years to hook into external table sources and other objects. We are a table source for Oracle or for DB2 or for SQL Server.

Dataupia and its competitors like to tout multi-terabyte data warehouse configurations -- sometimes tens or even hundreds of TB configurations. With your existing architecture, is there any practical limit to how high you can scale?

Theoretically, we're going to hit an inflection point somewhere around 4-8 PB. You might think I'm joking [about configurations that large], but I'm really not. We're just waiting for the customers. We actually have some interesting opportunities, although -- of course -- none of that size yet.

I know that with Netezza you had sort of an interesting architectural approach. Netezza uses those snippet processing units, which are little PowerPC engines. That gives them a formidable density. Now with Dataupia, you're using off-the-shelf hardware, right? Blades or rack-mounted servers? So is that any kind of limit -- i.e., how many servers you can cluster together -- to your ability to scale?

That's right. We use 2U blades. Off-the-shelf, white box, boring stuff. It's in the commodity chain, so we get commercial-grade equipment and we can follow Moore's law up and down very easily, so a node to us is 2 TB, and you can rack and stack [nodes] on up.

Speaking of hardware, there's another school of thought, touted by a few of your competitors, that customers don't necessarily want the hardware. I'm thinking of companies such as ParAccel or Kognitio, for example. They'll both sell you the hardware, to be sure, but they're also willing to sell you just their data warehousing software. I believe representatives from both companies have said that some customers, maybe a sizeable portion of customers, just want to deploy their data warehousing software on top of their existing assets. So my question for you is: what do you think of this? Have you thought about making Dataupia's special sauce available as a software-only configuration? That's not something we're interested in. I think if this [data warehouse appliance] is going to work, it has to pass what I call the TiVo test, and that says [that] you can go out and get the components of a computer and get a motherboard and something else and a disk drive and put it together and then figure out what [software] you need [to run on it] in order to do video recording. And you can make yourself a DVR if you want. And that's fun.

In today's world, wouldn't you much rather prefer to buy a TiVo, though? You put it there and you configure the channels. You personalize it. Personalizing is when you set it up for your particular shop, [for] things that are particular to your environment. Configuration is when you're tweaking the operating system, the file system, the middleware.

If you want to be in that situation, I'm not going to dissuade you. I think the more mature customers much prefer to have a single unit that they can have to plug in, [where] both the power and the networking are already there.

Some folks -- and I count Teradata among them -- have charged that the term "data warehousing appliance" is a misnomer, if only because most such appliances are actually deployed as application-specific data marts. Now you've been doing this for a long time, Foster, so do you know whether there's traditionally been some truth to that? Assuming that it was once true, is it changing at all? Are you seeing more of your appliances deployed as multi-application data warehouses?

This is one of the things where I once was passionately into what the enterprise data warehouse was, what a data mart is, and so on. I think from a mechanical standpoint, technically we work great as a data warehouse, a data mart, what have you. I think the controlling factor after just being in the market long enough is that the reality is that if you take [a company like] American Express with 11 silos, each of those silos buys on their own. So to get all 11 together to buy one machine that encompasses the entire enterprise, that's a huge undertaking. If you were to do it, you've calcified the organization, because now you can't get innovation in any of those silos. The reality is that while it sounds good to have a single machine, in reality it doesn't provide the nimbleness that you need.

You'd agree that segmentation or siloing of data is a problem in and of itself, right? That it's something customers are trying to eliminate -- in many cases by building these [enterprise data warehouse] systems?

I do very much believe that you need a view so that the whole organization can see your data. That doesn't mean that you build a grandiose machine that takes you 10 years to implement and then when it finally gets implemented doesn't do what you envisioned 10 years ago.

With Dataupia, we can put a very serious data warehousing machine up and running in no time at all, whether it's an enterprise data warehouse, whether it's a mart, whether it's just a data warehouse -- it really becomes an academic discussion. I know it's great marketing, but if you talk to 10 people, you get 10 different definitions of what's a data warehouse or a data mart.

Must Read Articles