Q&A: In-memory Databases Promise Faster Results
Recent technology advances are making in-memory databases more affordable, faster, and generally more feasible for a range of businesses. Kognitio's CEO explains the technology and how it is changing.
As the price of computer memory continues to fall, in-memory databases are moving within the reach of more business budgets. In addition, advances in architecture with 64-bit computing allow access to much larger memory spaces – and hence larger deployments.
In this interview, we talk with John Thompson, CEO of U.S. operations for U.K.-based Kognitio, which offers software called WX2 -- a relational, scalable analytical database -- that can be run on any industry-standard hardware running Linux. Thompson explains what in-memory databases are, and the rapid advances that have been made recently in the technology.
BI This Week: What do you mean by the term "in-memory database"?
John Thompson: The nature of most databases is that they reside on disk, and thus, as a database-resident file, must be issued a call to retrieve data. This requires additional cycles, thus increasing response time. In contrast, in-memory databases allow data to be loaded directly into memory (typically RAM), which reduces query response times. This is important because the nature of both transactional and analytic BI has evolved. Companies can no longer afford to wait days or weeks to format and then retrieve answers.
Speed increases are not gained simply by using the medium of memory itself. Because in-memory databases evenly distribute data across all the nodes that make up the database, all nodes are equally involved in satisfying a user's analytical query. This leads to even more performance benefits. Although most parallel analytical databases perform in the same way, they differ in that they use disk rather than memory for the redistributed and temporary data. The database, therefore, performs more disk-to-disk movements of data, inherently a much slower operation.
The technical capacities of in-memory databases have grown over time, so where they fit nicely into 64K of memory early on, today such databases can work over dozens of terabytes. In fact, we've just concluded a pilot project that integrated a 40TB implementation in-memory.
How has the technology progressed over the past few years?
The technology used for in-memory databases continues to develop and is tracking Moore's Law of doubling performance and halving costs. It's fair to say that the price of both disk and memory is continuing to fall, putting in-memory databases within the reach of more business budgets.
Where memory costs may have been prohibitive in the past, and where a CPU could make use of only so much memory in a 32-bit architecture, things have definitely changed for the better with the advent of low-cost, 64-bit server computing. Now, a much larger memory space can be accessed. The technology is allowing larger deployments to occur, deployments that need no longer be confined to smaller installations. That, in turn, brings increased query performance to a far broader range of firms, displacing the need for data aggregates, partitioning, indexing, and cubes.
What is an in-memory database best for? For example, does it excel at analytics? When is it simply not an appropriate solution, perhaps because of high cost?
Both transactional and analytic BI users can benefit from in-memory implementation. The faster response times we're seeing enable more "deep diving" into huge amounts of data.
I'm not sure there's an inappropriate use of in-memory, although some vendors might be charging prices that make it financially undesirable. In fact, we believe that the use of in-memory databases will grow significantly over the next 24 to 48 months because of increased power and reduced pricing, to the point where in-memory databases will become the standard for corporate BI usage.
Of course, not everyone is going to use in-memory databases. Elementary users don't need the power or speed; for them, implementing in-memory would be akin to using a howitzer to take out a gnat.
What kind of speeds are we seeing with in-memory? How much lower might response times go?
We have to be realistic about this: in-memory response times can reach sub-second return times right now, so it's really not a question of how much lower the response times may go. Rather, it's a question of how large the databases will grow, making rapid in-memory analytics even more desirable. Referring back to the 40TB implementation we just completed, that's essentially the first or second generation of current capabilities. How rapidly do we scale up? How quickly will we reach a one petabyte in-memory database?
Storage capacity will have to continue to grow to keep pace. We're already seeing instances of companies that import multiple terabytes of information every day; they previously analyzed the data and simply threw it away because they had no place to store it. The growth of commodity hardware, with its lower price, however, is enabling businesses to retain that data. In addition, it's allowed them to start a new line of business where they analyze the information and sell the results to others.
We see this business model -- generally with companies in which data is their lifeblood -- as fueling the growth and further development of in-memory databases, and the technology that allows them to perform at maximum speed.
Let's talk about return on investment. What makes in-memory databases financially viable? Where does it save money and how quickly does it do so?
The viability and ROI comes in the reduced query response time and in the ability for businesses to implement tiered storage architectures. By that we mean that a user can tailor a database to meet the performance demands of the business.
If we're talking about large data sets that aren't accessed frequently, they can be stored on disk. In contrast, data that needs to be accessed or queried frequently can reside in memory, ensuring the best performance possible, but again, these scenarios depend on requirements. Suffice to say that in-memory databases exist today that are flexible enough to offer performance when it's needed, at an affordable price.
How does Kognitio fit into what we've talked about here?
Kognitio's WX2 analytical database has been constructed from its beginning as an in-memory database, to take advantage of the speed available at any given point. WX2, currently in its seventh generation, is capable of in-memory implementations of dozens of terabytes, which enables companies to obtain responses to complex analytic queries in seconds or less in many cases. Even for the most complex queries, WX2 routinely returns answers in minutes; some competing databases have not been able to handle these queries in any length of time.