In-Depth

Analytic Performance Key is the Network, Not the Database

Aster Data Systems says new analytic platform can scale to handle large data volumes for analytic BI

Data is the lifeblood of a data warehouse, but when you have too much data, the problems begin. How can you process so much data and still get the response time users demand?

The secret, according to Aster Data Systems (http://www.asterdata.com), is in the network. "The bottleneck is shifting to the network," Mayank Bawa, CEO and co-founder of Aster, told BI This Week. "We've already optimized disks for their best performance, but the landscape has changed. IT is scaling with clusters of processors. Data isn't just in one place any more -- it's spread throughout many locations. Now we have to optimize how we route queries to the data and spread the data so processors can handle it."

If data doubles every nine months, as the company estimates, then it's a problem worth paying attention to.

Aster's solution is the Aster nCluster. It addresses data growth by "transforming off-the-shelf, commodity hardware into a powerful, self-managing, and scalable analytic database," according to Bawa.

The company says that MySpace has put Aster nCluster to work with over 100 nodes that has given the social networking company the ability to load millions of rows per second.

To illustrate its ability to scale, Aster says MySpace adds two terabytes of data daily into its 100 node cluster, maintaining hundreds of terabytes.

The nCluster database software, which sits on each node and makes all connected nodes in the cluster appear as a single node, is geared for analytics and replaces (rather than acts as a front-end to) an organization's current database (such as Oracle, DB2, or SQL Server). It uses algorithms and processes that control where and how data is partitioned; it also balances and replicates data and "optimizes query processing across clusters to make sure everything runs fast, even when the network is running slowly. In effect, Aster connects processors to enable the kind of large-scale data analysis many BI practitioners are conducting today -- all with less-expensive [commodity] hardware," Bawa said.

"Traditional databases figure out how to store and index the data," Bawa added. "We manage the data as well -- we go where the data is placed. Our software sits on top of the raw hardware and figures out where the data should be placed. Other algorithms handle queries when they come in; our map knows where the data is and says, 'How do I route the queries so they hit the nodes with the data we need? How do we shuffle the data efficiently so every processor has every bit of data it needs?' Then we send the answers back to the end user.

"Traditional relational database technology back in the 1970s was designed with a single server in mind, so the traditional paradigm is to optimize for the disk. Now with distribution -- not just 5 nodes but 500 nodes -- so you need a layer that can process data across this environment. You can add more storage and you can add more memory, but that's not going to work. You have to focus on the network."

The company claims that management of such large systems isn't significantly more complicated than managing smaller systems. In fact, its architecture reduces some chores, such as adding new nodes (which is certain to happen as data volumes grow). Bawa says the software can use its management console to provision new nodes quickly: loading the operating system, partitioning the disks, running hardware tests, moving data so it's load balanced across all nodes, and rebuilding the index in as little as 20 minutes with no downtime to the rest of system. "Imagine doing that with Oracle; it takes many weeks of planning and execution," Jack Norris, VP of marketing at Aster, said.

nCluster is priced based on the amount of data you manage.

About the Author

James E. Powell is the former editorial director of Enterprise Strategies (esj.com).

Must Read Articles