In-Depth

Bigger is Better in Newest Database Niche

Why are hardware and database behemoths focusing so much attention on a segment that -- just 18 months ago -- was a relatively sleepy niche?

What's the big to-do about large databases? In the last month, both Oracle Corp. and Microsoft Corp. touted their largest and most scalable data warehousing (DW) systems to date: Oracle's "Database Machine" and Microsoft's still-gestating Project Madison. Also last month, high-end data warehousing specialist Teradata Corp. -- newly independent after last year's spin-off from NCR Corp. -- unveiled a "Petabyte Club" at its user conference.

Meanwhile, both HP and IBM Corp. market enormous database systems of their own: HP's Neoview and Big Blue's Balanced Configuration Warehouses, respectively. Sun Microsystems Inc., for its part, has a prominent hardware partnership with very large data warehousing (VLDW) specialist Greenplum Inc.

This isn't just tweaked or optimized database software. With the exception of Microsoft -- which (as part of Project Madison) plans to develop 50 TB or greater database configurations in tandem with hardware partners HP, Dell Computer Corp., Unisys Corp., and others -- all of these vendors are offering pre-configured VLDW systems. HP, IBM, Oracle, and Sun (with Greenplum, and in other contexts) all claim that they can scale from hundreds of terabytes on up to the petabyte range; Teradata – long the king of the VLDW segment – says it can now support DW configurations of up to 50 PB -- that’s 50,000 TB.

Why are major hardware and database vendors focusing so much attention on a segment that about 18 months ago was a relatively sleepy niche?

For starters, both IBM and Oracle have long been competitive in the VLDW segment: the key difference is that both vendors are now targeting that space with pre-configured, pre-optimized DW systems, consisting of both DW software (and associated applications or utilities) and hardware.

Why they're doing so is a particularly thorny question. Oracle and Microsoft, for example, say that their new VLDW offerings flesh out their flagship RDBMSes with parallel (massively parallel processing (MPP), in VLDW-speak) capabilities.

This is true -- to a degree. Neither Oracle's Database Machine nor Microsoft's as-yet-undelivered Project Madison are "classically" MPP, à la Teradata. Critics, however, say that the major RDBMS systems tend to be optimized for online transaction processing (OLTP) -- and not, as a rule, for data warehousing workloads. Industry veteran Mark Madsen, a principal with data warehousing consultancy Third Nature, notes that -- right now, anyway -- most DW footprints clock in at less than 1 TB.

The salient question, he argues, is why so many customers, particularly those running conventional RDBMS products from IBM, Microsoft, and Oracle, are having performance issues -- even at the low end.

"[I]f data warehouses are generally 1 TB or less, why are we having performance problems?" he asks. "To me, that simple question says something about current databases, the state of knowledge in the market, and vendors." As Madsen sees it, at least 80 percent of shops could probably run on standard DB2, SQL Server, or Oracle -- provided they're able to pay for talented DBAs and speedy hardware.

Randy Lea, vice-president of product and services marketing with Teradata, is more critical. "Oracle, DB2, Informix, Sybase: our competitors have always tried to position themselves as a database for everything. For both OLTP and data warehousing. Quite frankly, the architectures are very different," he argues. "Oracle [and] SQL Server were architected for transaction processing, that's why to a certain degree [Oracle and Microsoft are] trying to add parallelism to an architecture that's transactional-related. That's where our tradition is."

There's another wrinkle here. Few dispute that Oracle or DB2 -- and, to a lesser extent, SQL Server -- can scale to support very large configurations. Oracle, for example, touts several very large DW implementations: large-scale data warehousing specialist Winter Corp., which used to publish a DW Top 10, once simultaneously listed four Oracle DW configurations -- including Yahoo (at 100 TB) and AT&T (at 94 TB). There's a caveat, of course: the Winter data stems from 2005, the last year for which that firm published Top 10 information. Last year, Winter also profiled a pair of customers in the financial and retail sectors running Oracle warehouses of larger than 50 TB. Elsewhere, the aging DW Top 10 lists a few DB2 entries: a 49 TB cluster at KT-IT Group and a near-20 TB configuration at an unnamed customer site. One SQL Server entry -- a 19.5 TB monster built by Unisys for UPSS -- also made the list.

Scalability Is Key

Again, the issue isn't that Oracle or DB2 (or even SQL Server) can't scale. What's at issue is just how easily -- and how inexpensively -- they can do as much. As data management guru Mark Madsen put it, for most applications, off-the-shelf RDBMSes should get the job done -- it's just a question of hiring knowledgeable, savvy DBAs and deploying speedy hardware.

As John O'Brien, chief technology officer with data warehouse appliance specialist Dataupia Inc., concedes, Oracle -- particularly with its Real Application Clusters (RAC) option -- is a highly scalable, highly available RDBMS. The trick, according to O'Brien, is to deploy and manage it in the first place.

RAC, he argues, is an extremely complex proposition. "I know a lot of high-end Oracle guys and I know a lot of high-end Oracle shops. They bought into [RAC] because it lets them scale to these bigger configurations. Properly maintained, RAC will do that. But these [shops], in a lot of cases, two to three years later, they're still struggling trying to figure out how to get RAC running at 6 nodes, or at 8 nodes. The more [nodes] you add, the more complex it gets," he argues.

O'Brien and Dataupia both tout an all-in-one appliance -- which they market, moreover, as a "bolt-on" accelerator for Oracle and SQL Server databases (i.e., Dataupia doesn't replace these RDBMSes, but "supercharges" their analytic workloads) -- precisely to customers who balk at the cost or complexity of scaling COTS RDBMS systems to support VLDW configurations.

However, this is precisely why Oracle, Microsoft, IBM, HP, and even Sybase (which announced its own "Analytic Appliance" -- running on top of System p hardware from Big Blue -- earlier this year) have launched all-in-one, pre-configured database systems. O'Brien argues that the existence of such offerings basically validates Dataupia's (and its competitors', which include Teradata, Netezza Corp., Kognitio, ParAccel Inc., Vertica Inc., and others) positions.

"The question for customers is, 'Do I want to go with something that's truly a simplified, incrementally scalable appliance-like solution like Dataupia, or do I want to go with something that either needs lots of configuration and tuning, in the case of [vanilla Oracle], or [with Oracle's Optimized Warehouse or Database Machine systems] is sort of canned and sort of optimized for particular workloads, but isn't overall as flexible as a dedicated appliance."

Oracle officials, for their part, don't exactly see it that way. Willie Hardie, vice-president of database marketing for Oracle, rejects the notion that his company's RDBMS is somehow better suited for OLTP than for analytic workloads.

"The Oracle database is proven to be the fastest database out there for both transactional systems and data warehousing systems, across all scales, from small to extremely large systems," said Hardie, in an interview this summer. "You ask any Oracle customer out there and they'll all give you the same answer: Oracle is the fastest database out there on the market right now."

Indeed, in promoting the Oracle Database Machine last month, chief Larry Ellison promoted that product -- which runs Oracle on both its database server (shared-everything, query aggregator) and Exadata (shared-nothing, distributed storage) nodes -- as the fastest, most scalable, and most available database system available today. Ellison touted a performance increase from 10 to 70 times over vanilla Oracle running on standard hardware configurations.

HP has several irons warming in the high-end database fires. Its Neoview offering, for example, runs its own NonStop database and operating system software; HP is partnering with both Oracle and Microsoft to promote supplementary VLDW offerings, too. The company see the issue as one of being (mostly) all things to (mostly) all customers.

"I think we're different, with all of our partnerships and the capabilities to offer [our partners'] products and our own products in the data warehouse," says Rich Ghiossi, director of business intelligence for HP Software. "The issue is that [customers] have different requirements. If they're already an Oracle or a SQL Server [shop], we want to help them use the solutions they're comfortable with.

“These [pre-configured appliances or appliance configurations] offer an opportunity for us to give them the absolute best Oracle experience or the absolute best SQL Server experience that's pre-configured and optimized for HP's world-class hardware. A lot of [customers] don't want to buy the software separately. They don't want to buy the hardware separately. They don't want to have to configure all of that separately," Ghiossi explains.

Must Read Articles