In-Depth

Tech Talk: Big Data Meets Big Density

Forget Big Data: can today's -- or tomorrow's -- DW workloads take advantage of Big Density?

Thanks to Moore's Law, the future is looking ever denser, ever more crowded, and ever more complex, especially from a data warehousing (DW) point of view.

Moore's Law basically states that compute capacity -- understood as a function of transistor density -- reliably doubles every 24 months. Thanks to Gordon Moore and his eponymous Law, BI and DW practitioners are newly-conversant with terms such as "core density," "memory bandwidth," "I/O bandwidth," and "multi-threading." It's likewise because of Moore's Law that a few vestiges of Old Tech -- symmetric multiprocessing (SMP) and non-uniform memory architecture (NUMA) -- now enjoy fresh currency in BI and DW circles.

It's not that these are new terms. It's that they didn't used to matter, not in DW circles, at least.

That's because DW has a love affair with massively parallel processing (MPP), which provides a means to predictably scale and effectively distribute DW workloads. It didn't happen overnight -- after all, developing and optimizing an MPP engine is a decidedly non-trivial enterprise -- but over time, the high-end of the DW market effectively selected in favor of MPP.

Today's MPP data warehouse systems are increasingly SMP in aspect, however.

Thanks to trends in processor development, even the smallest MPP nodes are stuffed with CPU horsepower: the smallest single-chip blades or servers, which are typically configured as individual nodes in an MPP cluster, can pack six, eight, 10, 12, or even 16 cores, depending on which chip is used. An MPP system that might once have been populated with two or four processors is now stuffed with 12, 16, 24, or 32. That's Big Density.

Big Density Background

First, a brief recap of recent trends in microprocessor design.

Eight years ago, Moore's Law looked to be on the ropes. Intel was confronted with the failure of an architectural strategy (dubbed "NetBurst") that yoked improvements in clock speed to improvements performance. You couldn't have one without the other, and Intel's Pentium 4 architecture was specifically designed to excel at higher clockspeeds, or frequencies.

The trouble was that Intel couldn't scale Pentium 4 much beyond 4-GHz, at least using then-state-of-the-art fabrication technologies. Because of the NetBurst design, Intel couldn't just engineer a beefier Pentium 4 -- i.e., one with more transistors -- and run it at the same (or lower) clock speeds, either.

What's more, in its commitment to NetBurst and Pentium 4, Intel had ceded the performance and scalability crowns to rival Advanced Micro Devices (AMD) Inc., which emphasized efficiency (expressed as the ability to execute more instructions per clock cycle) in the design of its Hammer architecture. AMD also committed to a commodity 64-bit strategy years before Intel, with the result that Intel was forced to play catch-up in the 64-bit arena, too.

Intel discontinued Pentium 4 and committed to an entirely new architecture: Core. At the same time, Intel and AMD both committed to a new design strategy: the consolidation of multiple processor cores into a single chip package. It makes sense: the engine of Moore's Law isn't improvements in processor performance -- that's its consequence -- but predictably periodic improvements in miniaturization. A decade ago, Intel and AMD used to exploit Moore's Law to design ever larger uni-processors, or large uni-processors with ever-larger caches. The post-Pentium 4 turn was to pack more processor cores into the same amount of space.

Which brings us to the present -- to Big Density. If you want to buy a data warehouse appliance from -- for example -- DW specialist Kognitio, the smallest system you're going to get includes two chip sockets. In the old days, each socket would have accommodated a single uni-processor chip. Each server (or "appliance") would have had two chips, each with one processor.

This is basic SMP: distributing -- i.e., scaling -- a workload over multiple processors.

In Kognitio's case, a typical server comes equipped with two AMD Opteron chips, each with 16 processor cores. That makes 32 cores per server. Even half a decade ago, 32 cores was pushing the limits of high-end SMP in the x86/x64 server space. Now it's part of the standard package.

Or is it? After all, one reason the DW market selected against high-end SMP and in favor of MPP is that MPP offers better price/performance. A decade ago (and more) vendors such as Sun Microsystems Inc. and Oracle Corp. -- to cite just one (particularly ironic) pairing -- trumpeted very large data warehouses running on enormous RISC/Unix NUMA rigs costing millions of dollars. NUMA's another one of those Old Tech terms that's enjoying a kind of revived currency, thanks largely to SAP. Hardware vendors -- along with ISVs, such as Oracle -- embraced NUMA because it scaled more effectively in multiprocessor configurations than did SMP.

That's the rub. Conventional SMP scalability was at one time such a problem that Oracle spent millions of dollars porting its namesake database to NUMA. SMP was -- and to a degree, still is, according to its detractors -- a game of rapidly diminishing returns: a workload that scales at nearly 100 percent over four processors -- i.e., four processors are able to do almost twice as much work as two -- might scale at 83 percent across eight processors.

In other words, when you add more processors, workload performance doesn't increase linearly; if you add too many processors, performance gains become negligible. SMP scalability is extremely software-sensitive: applications (or workloads) must be carefully tuned or optimized to scale effectively in large SMP configurations. Ditto for NUMA, which is even more sensitive to software issues.

If you solve the software problem, you still have to grapple with a host of other problems. Hence, the salience of terms such as I/O bandwidth and memory bandwidth. Both are expressions of the limitations of storage and memory performance. Both function as bottlenecks to constrain SMP scalability. Both are obviated, to a degree, by MPP: theoretically, if either becomes a problem on a single node (or, generally speaking, across an MPP cluster), you can just add more capacity. That's the beauty of MPP: it 's designed to scale almost linearly. When you add an extra node or nodes, you reap a corresponding performance in workload processing.

Which brings us back to Kognitio. It could have used Intel's Xeon chips, but according to CTO Roger Gaskell, Kognitio's WX2 database is able to scale effectively over multiple cores. More to the point, says Gaskell, WX2 actually needs all of those cores: given a choice between Intel's faster Xeon chips and AMD's denser Opterons, Kognitio opted for Opteron. In other words, claims Gaskell, WX2 is able to scale very effectively in large SMP configurations.

"We're using predominantly AMD [chips], because we get more physical cores. [At the] end of last year, AMD gave us samples of 16 core processors [in place of] 12 core [processors]. We immediately [saw] benefits," says Gaskell. "We were designed from the ground up to be parallel. We parallelize every aspect of every query, we use every single CPU on every single server for every single query that runs. All of the cores are equally involved."

Kognitio isn't the only one. Analytic database upstart Calpont Inc. makes a similar claim for its InfiniDB product. SAP, for its part, says it uses NUMA in its HANA appliance to support extremely large multiprocessor configurations -- in its case, of up to 80 processors in a single NUMA image.

Big Density Doubts

Of course, not everybody is comfortable with Big Density. "The law of diminishing returns applies with SMP," argues Randolph Pullen, a former technical architect with analytic database pioneer Greenplum Software Inc. and founder of DeepCloud, an Australian firm that markets a parallel data warehouse based on the VectorWise X100 engine. "Clever shared-memory and I/O systems only delay the inevitable. Eventually the backplane -- or crossbar switch -- clogs up."

Pullen says he isn't blind to the attractiveness of SMP. MPP is hard. It takes time, money, and considerable resources to develop and optimize an MPP engine. (It took Microsoft Corp., for example, nearly three years to introduce an MPP version of SQL Server -- even after it purchased MPP technology from the former DATAllegro Corp.) It's likewise more difficult -- compared, at least, to conventional software development -- to program for MPP platforms. "[W]riting programs in the parallel paradigm is quite different and difficult," Pullen continues. "Conversely, the attraction of SMP has always been the simplicity and familiarity of the programming model."

The issue -- even if you agree with Pullen's point of view -- is that DW vendors are pushing ever-larger SMP configurations on MPP database buyers. Whether today's (or tomorrow's) DW workloads can exploit all of this capacity -- and representatives from Kognitio, Calpont, SAP, and other vendors claim that it can -- may be beside the point. Whether this capacity is likewise hamstrung by architectural shortcomings is likewise beside the point.

The capacity is there. Server vendors have built it. What's more, even Pullen concedes that increasing density does have its advantages. "DeepCloud certainly [can] use and benefit from multicore chips. Our node engine ... loves multicore chips and our MPI stack is very happy to treat each core as a separate node," he comments, stressing that -- density benefits and software scalability improvements aside -- hardware vendors need to engineer similarly scalable I/O.

"In [an] I/O-heavy application, there is no advantage to dividing an eight-core system into [two nodes of four cores each] because the available I/O is halved and increasing [a system's] I/O capacity is essential to improving [its] performance," he concludes.

Mark Madsen, a principal with DW consultancy Third Nature and a veteran of SMP and NUMA trailblazer Sequent Computer Systems Inc., thinks the SMP angle isn't nearly as risible as some detractors claim. A little over a decade ago, he points out, the fastest data warehouse systems in the world weren't MPP platforms. They were super-sized SMP or NUMA systems marketed by Sun, IBM Corp., and Hewlett-Packard Co. (HP). These systems also had super-sized price tags, Madsen concedes, but they demonstrated that SMP can scale -- and that the software, hardware, and architectural problems that combine to limit SMP scalability can, up to a point, be saved.

If an MPP database can effectively scale-up (across an SMP configuration) on a single node, cluster-wide performance will be the better for it.

"I would argue that increased core counts plus more memory plus [storage] mean that each node gets a lot bigger. As each node gets bigger, both MPP performance and SMP performance are important," he explains, noting that an MPP database which scales poorly on a four-way system won't scale any better on an eight-, 12-, or 16-way system. "So SMP performance on a node is as important as MPP performance, even for a native MPP database."

Must Read Articles