In-Depth
Analytic Database Pioneer Netezza Riding High
Analytic database pioneer Netezza Inc. is riding high. Late this summer, it took a long-awaited commodity plunge, trading in a quasi-proprietary architecture (it leveraged not-quite-commodity PowerPC chips from IBM Corp.) for Intel-based systems.
Recently, it enjoyed free publicity when Oracle Corp. chief Larry Ellison again dropped its name during Oracle's Exadata v2 launch. (Ellison had famously invoked Netezza's name during last year's Exadata v1 -- nee Database Machine -- coming out party.)
With new appliance configurations on tap and an "advanced analytics" push in the works, Netezza officials seem positively ebullient.
That free plug from Oracle certainly didn't hurt. "We love it when they mention us. Oracle spent the last year validating and endorsing Netezza's big ideas, both the idea of a data warehouse appliance and also the idea of doing pre-processing [in the storage nodes]. This [Exadata version 2] announcement just underscored that," says vice-president of marketing Tim Young.
Last September, when Oracle announced Exadata v1, Netezza's corporate line was ringing off the hook, prompted -- in some cases -- by folks who'd never before heard of the company, claims Young. At the time, he concedes, Netezza probably didn't have all of the answers that disaffected Oracle customers wanted to hear. This time around, Young says, the company is in a better overall position.
"If a customer had said to us last year, 'We're a global company and we're interested in having a single version of the truth and we want our data centers in Tokyo, London, New York, and San Francisco to be synchronized, how can you help us?', our response would have been: Buy the biggest machine that you can afford from Netezza and connect it to everyone and hope that the response [time] is going to be okay," Young acknowledges.
"Now we're looking at a more sophisticated response to that question, which would embrace things like federation between the systems, replication, and a middleware layer that achieves better compatibility with Oracle."
If you don't remember Netezza delivering on any of this it's because "none of that is being delivered today," Young concedes, "but it's in the works. We're developing some of this in conjunction with our partners. It's kind of like a mixed approach. It's a combination of using partner stuff but also developing our own."
For example, Young says, Netezza is developing its federation technology in tandem with an unspecified partner, whom he firmly declines to name. Netezza's enhanced support for Oracle -- which Young suggests could appear in Q1 of next year -- is being developed in house.
"One of the problems we do run into is if an Oracle [customer] has a lot of procedures developed in PL/SQL, then that is a problem for a Netezza customer because Netezza doesn't understand PL/SQL and doesn't have an ability to store those PL/SQL procedures," he explains. "We're developing a native PL/SQL translation layer. This isn't going to be a silver bullet. There will still be a couple of limitations. PL/SQL is basically a cursor-based structure. It allows you to open a cursor into the database and do some operations and note the procedural framework and then close that cursor again. Because we don't support that concept, there will be certain scenarios where we can't translate PL/SQL."
Netezza is also working to burnish its analytic bona-fides, Young continues.
"One of the other big initiatives that we have going on at the moment is to look at a better support framework for advanced analytics. Basically what that means is allowing us to execute … languages other than SQL. Once we start supporting other languages within Netezza itself -- including languages that are very analytic-oriented, such as the open source language R -- this basically opens Netezza up to support or implement all sorts of different models natively," he comments.
One such API, of course, is the increasingly ubiquitous MapReduce, but this approach, of which MapReduce is but one planned component, is not without its attendant pratfalls, Young explains. The reality is that Netezza has to tread carefully, lest the specter of its analytic aspirations should alarm an important partner.
"One of the things we've been struggling with is as soon as we support something like the analytic language R, potentially that moves us from being a database execution platform to kind of being an advanced analytics platform. We have to be very careful that we don't see ourselves as becoming a tool for writing advanced analytics applications," he says.
"We simply see us as being the engine for executing advanced analytics. So it was interesting at our user conference that we had the CTO from SAS who was talking about some of the porting that SAS is doing to effectively port their own Scorecard Accelerator to run on Netezza so that Netezza natively within the kernel can execute the SAS commands without having to move data outside into a SAS environment."
The upshot: Netezza knows its place.
"We see clearly that vendors like SAS are the front-end toolsets that people are using to define their advanced analytic applications, which would just simply run faster natively inside Netezza," Young avers. "That is not going to change and, frankly, we don't want that to change. We'll stick with what we do best and we'll happily let [companies like] SAS do what they do best."
SAS shouldn't be too alarmed. What Young describes sounds similar to an initiative that SAS itself kicked off nearly two years ago, first with Teradata Corp., but notionally with any database partner willing to implement native (kernel-level) support for its analytics. Of course, according to Young, SAS is just one of several languages or APIs Netezza expects to accommodate.
Young says Netezza plans to introduce what might be called "fat" and "lean" configurations of its new TwinFin appliances.
"One of the great advantages that we have with the new [commodity] architecture is that it allows us to be much more flexible with our packaging. For example, we will be announcing a high-capacity version of our product. Right now, a [TwinFin system] basically consists of four rows of storage devices and two rows of blades in a typical cabinet. If we pull one of the rows of blades out and replace that with storage, basically we significantly increase the capacity of the machine," he explains. "Of course, it's a lot slower, but if somebody's looking for a big, cheap data warehousing solution, we can now deliver that."
Netezza also plans to introduce a "lean" TwinFin system that packs an extra 4 TB of system memory. "We will be announcing an ultra-fast machine which is not an [in-]memory machine [but which] is more of a memory-enhanced machine. We would have the benefits of the persistent store, and we would have a significant amount of memory for caching and that sort of thing. That provides us with a new option for a different kind of configuration of machine," Young explains.
"The other kind of configuration is that we have the spare slot where we can insert additional servers and dedicate those [servers] to specific applications and then deliver application-[specific] appliances. The opportunities are endless. We're going deeper in terms of advanced analytics, higher in terms of incorporating and preloading applications, and wider in terms of a family of products."