In-Depth
ParAccel Touts Columnar Analytic Database
ParAccel’s analytic database can be used on its own or dropped right into place alongside your existing SQL Server assets.
Speed, especially vastly improved query performance, is the Holy Grail of the high-end data warehousing segment. It doesn’t matter that today’s enterprise data warehouses or orders of magnitude faster than their predecessors, nor that—in just the last 30 months—query performance (as recorded by the Transaction Processing Performance Council’s TPC-H benchmark, among others) has exploded exponentially. No, some customers can’t ever have enough speed.
For a long time, high-end query performance was the staple special sauce of Teradata, the soon-to-be-erstwhile subsidiary of NCR Corp.
Over the last five years, however, a number of vendors—including data warehouse appliance pioneer Netezza Inc., relational database stalwart Sybase Inc., and Unix king Sun Microsystems Inc., along with several others—ventured into the large-volume, high-performance data warehousing segment, promising improved performance and mind-boggling scalability.
More recently, this segment has grudgingly accommodated a number of new aspirants, including next-generation appliance vendor Dataupia Corp., analytic data warehouse specialist InfoBright Inc. (which applies "rough set" mathematics to analytic query issues), and—most recently—ParAccel Inc., a columnar database specialist that, with the launch of its ParAccel Analytic Database, vaulted to first place in the TPC’s TPC-H decision support benchmark.
More precisely, ParAccel vaulted to first in the 100 GB, 300 GB, 1 TB, and 3 TB segments of the TPC-H rankings, beating out—in both price and performance—familiar players Hewlett-Packard Co. and Dell Computer Corp.
Talk about making a splash. ParAccel officially launched its flagship product at the TDWI World Conference, held late last month in Orlando. Officials describe the ParAccel Analytic Database as a high-speed, CDBMS—or columnar database management system—that accelerates processing for demanding query-intensive BI and data warehousing applications.
ParAccel is designed be implemented on its own—i.e., as a standalone analytic DBMS—and can plug right into an existing data mart or operational data store. That’s what ParAccel calls the "Maverick" implementation scheme. There’s also an "Amigo" implementation whereby the ParAccel Analytic Database acts as a kind of drop-in acceleration platform for SQL Server (right now) and Oracle (forthcoming) RDBMS environments.
In this configuration, officials say, ParAccel plugs right into an existing SQL Server RDBMS instance and provides high-performance query routing, synchronization, and syntax offloading services.
Like rival Dataupia—which launched earlier this year with Netezza veteran Foster Hinshaw at the helm—ParAccel’s executive team is also pedigreed: founder and CTO Barry Zane is a Netezza veteran, too. ParAccel Analytic Database is a different animal than Netezza’s RDBMS-based DW appliances, however: it’s a columnar database that’s capable of running entirely in-memory.
That gives it a competitive leg-up over Netezza and other DW appliance vendors, argues Kim Stanick, vice-president of marketing with ParAccel.
The In-Memory Advantage
"Given the fact that we are both in-memory capable and we also have a disk-based capability, people who want to run extremely fast systems can benefit," she indicates. The timing is right, too, Stanick and other ParAccel officials argue. Thanks to a number of drivers—including in particular the push for real-time or near-real-time analysis of operational data—they believe ParAccel’s in-memory columnar database technology will generate quite a bit of buzz.
"If you’re looking for the really, really high-performance, all-in-memory scenario, if you’re looking to support real-time or near-real-time, you’re typically going to have a smaller set of data—you’re not going to be looking across a big history --it really makes sense for you to run in-memory," Stanick points out.
ParAccel’s in-memory capabilities give it a clear performance advantage, she continues—but its columnar design amplifies that edge. "We take the core data, the data itself [i.e., straight from the source repositories]. We don’t require indexes or summaries or aggregation tables. That’s the advantage of columnar and compressed data: we can get really great performance just against the raw data using whatever schema you give us," Stanick argues. "You don’t have to build a data warehouse-compressed schema if you don’t’ want to."
Moreover, ParAccel officials argue, a large data warehouse footprint isn’t necessarily an impediment to running in-memory. "Most analysts say it’s about a 4:1 [core] data to blown-out ratio, so we only require you to load the core data; we’re saving you 4:1 there, and if you add compression on top of that—which is also about a 4:1 [reduction factor]—you now can compress 1 TB of data down into 125 GB of memory," she indicates.
"What we use as a rule of thumb when you’re going to run in-memory is that about 40 GB of user data will fit into about 16 GB of memory, which is a very standard server size."
Its in-memory value proposition notwithstanding, ParAccel’s drop-in-place Amigo configuration will likely resound with customers, too. Like rival Dataupia—which promises to work more or less out-of-the-box with a customer’s existing RDBMS assets—ParAccel Amigo is designed to complement existing SQL Server implementations (Oracle support is promised for next year.)
"This allows you to grow the system and provide the ‘queryability’ with the syntax coverage of your native SQL environment, so you don’t have to rewrite your applications," Stanick explains. "One cluster can actually mirror multiple databases of record. You can get economies of scale and scale out as needed. The real point is that you offload the systems that are struggling with performance—you offload the heavy, complex, ugly queries so that those systems can do what they’re designed to do."
This helps accelerate vanilla SQL Server performance, too, according to Stanick: "SQL Server is a very nice operational database … and what clients don’t realize is that they’re actually causing themselves a lot of pain by running these [complex] queries on it."
ParAccel touted two prominent launch customers. One—telecommunications specialist LatiNode, which provides least-cost-routing services for calls placed to Latin America—used a Maverick implementation of ParAccel Analytic Database to cut its processing time from 60 hours to 2.5 minutes, topping out at six (mostly commodity) Sunfire 4100 servers. LatiNode plans to deploy its production implementation on top of HP DL380 systems.
"They took a look at that and decided that they didn’t really need to be able to run the query that fast, so they actually scaled it back a bit," Stanick says.
She declined to identify ParAccel’s other prominent client—although she did describe it as a Fortune 500 information services provider for the legal profession. That company tapped a ParAccel Amigo implementation to run its queries an average of 50 times faster, Stanick notes. From this customer’s perspective, implementing ParAccel Amigo made a lot more sense than building a complementary data warehouse from scratch.
"They basically said [that] for us to turn around and have to build a whole ETL process, a data warehouse schema, all of the data warehouse project and design, just to get slightly faster performance—it wasn’t worth it," she says. "With Amigo, they can literally bring in their operational system … and [plug right into] that system. It’s the best of both worlds. [Their operational system is] tuned for OLTP, but it’s also tuned for decision support [with ParAccel], so the apps aren’t built specifically for a platform any longer."