In-Depth

Illuminate's Correlation Database Accelerates, Expands BI Queries

CDBMS stores data values, not records or tables, to reduce analytic response time by increasing query flexibility

As data warehouses grow from gigabytes to terabytes, getting information into (and out of) such data stores starts to take its toll on performance. Data queries and analysis return results more slowly, and table-oriented scans for information take considerably longer, discouraging users to probe deeper into their data and ask questions that can reveal new customer relationships or trends.

Illuminate Solutions (http://www.i-lluminate.com) says their correlation database management system (CDBMS) can help build data-driven data warehouses, data marts, and analytics applications. It has begun selling its database engine and tool suite (including iCorrelate, a data exploration tool, and iAnalyze for dashboards/light analytics and mapping) in North America this month.

Using iLuminate, the company's correlation DBMS, all data is indexed -- every value in every field -- and unique data values are stored once and only once, no matter how often the value appears and regardless of its original location, all in one giant data store. Null or missing values are not stored; they're implied by the lack of a value in the index (what illuminate calls the data dictionary). The company calls this technique a value-based storage (VBS) structure, and says it can unlock the "true value within an organization's data with 100 percent ad hoc query flexibility."

The advantages: there's no need for database pre-design or upfront requirements definition; a data-driven logical scheme is automatically built as data is indexed during the loading process. In fact, because all information is indexed, the scheme includes all possible data interrelationships (or correlations). New data sources can be added on the fly. No physical schema is needed.

Typically indexing such large amounts of data takes its toll in two ways: it slows data loading and it takes more data storage. Joseph Foley, illuminate's chief architect, says that load process speed is comparable to traditional RDBMSes when total time "from start of load to availability to users" is compared. It's designed to use as much RAM as the O/S can provide; the company says that two gigabytes of RAM provides optimized access to up to 100 million records; four GB is sufficient to access 1 billion records efficiently. The VBS makes every value directly addressable, speeding access because no table scan or column scan is required to return data.

Storage needs are reduced the more data values are repeated because the iLuminate database doesn't store duplicate data, just an index to the value. Foley says that with 100 million rows, a CDBMS is about the same size as the raw data file; as the amount of data grows, a CDBMS' storage need is smaller relative to the original raw data.

Foley notes that traditional data warehouse vendors are focused on data optimization and query performance. "They want to make things better, faster, and cheaper," he notes, "but they're stuck in a Catch-22. Building for flexibility harms performance; building for performance harms flexibility.

Furthermore, such optimization doesn't represent all potential relationships -- just those that are designed in." iLuminate, he says, provides a fast "time to answer" for users and lets them perform incremental queries (drill-down, drill-up, drill-across) at will. "Pre-defined data structures restrict the range of questions an end-user can ask, so users aren't free to create the queries they really want answered," he told BI This Week.

Data analysis tools are more than adequate to provide analysis and visualization results, but they lack the flexibility (and thus the greatest value to the organization), Foley says. In addition, if you want to investigate events that occurred on a particular date, a traditional database can return all orders placed on that date, for example. The CDBMS can easily find all events for that date, be it an order date, invoice date, refund date, cancellation date -- you name it. If it's a transaction date of some time that you track, the iLuminate CDBMS can find it.

Sales by region or month are typical queries in SQL-based analytical systems, but such systems can't answer more complex questions, such as "How many customers buying product A are no longer buying product B that they purchased last year?" Such complex queries may require new data structures or relationships (such as a new foreign-key relationship) so the SQL language query can be written to supply the answer. The problem: such requests must be queued up within an already overburdened IT department). It's the "unexpected ad hoc questions" that the iLuminate CDBMS can tackle easily, Foley maintains. "We also make it possible to give you unrestricted multidimensional OLAP and ad hoc queries."

iCorrelate helps users get answers quickly by guiding them through an organization's accumulated knowledge. It provides a high-level view of all data sources and helps users form questions and returns the answers.

The iLuminate database can be used by any tool that supports ODBC connections, including tools from Cognos, Business Objects, MicroStrategy, SAS, SPSS, and Microsoft, among others.

"Another big plus is that database administrators no longer have to focus on hardware performance; they can, instead, spend time helping users get the greatest benefit from the system. They can be come coaches to the organization."

Raw data is loaded and immediately profiled by the iLuminate built-in data dictionary; users can run queries against the dictionary to find data-quality issues.

The iLuminate CDBMS engine is priced by the millions of records loaded, beginning under $45,000, with additional concurrent user licenses available.

Must Read Articles