Classify and Move: Getting Stored Data in Order

The B-tree is an endangered species … .thank goodness.

One irony of this industry is that analyst product classifications don't always fit -- especially for data classification tools. It is a problem that Kate Mitchell, board member of CopperEye, knows all too well.

CopperEye is a UK-based firm making inroads into the U.S. market. Mitchell, based in Stamford, CT, is part of the company's advanced guard. She has been laboring to offset the impression left by analyst classification about what CopperEye does for a living. Some have included CoppeEye's technology in the generalized "search tools" category. In fact, their technology does search -- but that, she says, isn't an accurate evaluation. It is an enabling technology for what CopperEye is really trying to do: help businesses organize, retain, and retrieve their data utilizing a much smaller hardware footprint, far less complexity, and at dramatically lower cost than alternative approaches such as a database or data warehouse.

Not all data. Mitchell is quick to point out that CopperEye' looks at structured data that would normally live within a relational database. Text indexing is not their game. The product discovers, parses, and indexes databases and the structured data by-products (logs, extracts, transactions, and other "flat files"), so information can be retrieved expeditiously. As early adopters in the telecom field can attest, they do it very quickly.

"We do more than search," Mitchell argues. "We can capture tens of thousands of transactions per second" and without the hassles and limitations of conventional b-tree indexing.

B-tree is a tree data structure commonly used in databases that keeps data sorted and allows searches, insertions, and deletions in logarithmic amortized time. It is notorious for bogging down under heavy work loads. CopperEye's patented technology demonstrates considerably better performance than b-tree in extreme high-volume conditions such as those involved in processing cell phone call records.

Founded in 2000, CopperEye's product development has been largely driven by its customers. Orange, a large European telco, was an early adopter. They were dissatisfied with the performance of a solution requiring call records to be uploaded to an Oracle database where they could be analyzed, only to flush them from the databases after use. With CopperEye, they could process flat file records with the same SQL language but without the import/flush operation -- saving huge amounts of time and money.

The combination of a simple search front-end gave industry analysts the impression that CopperEye was a search tool, and the product was promptly lumped together with other search engines, muddying CopperEye's marketing message for more than a year. In 2004 the company entered a joint venture with IBM to develop indexing solutions for Informix based on the DataBlade, and has partnered more recently with Sun Microsystems to add value around that company's solutions for telecom.

Mitchell says that, while these activities were helping to include CopperEye inside of other solutions, the company also wanted to create simple, reliable, mainstream products based on their core technology that could help businesses sort out their database junk drawers. Greenwich is the name of that product set, which Mitchell says "secularizes" the core software developers kit (SDK) technology designed to "help companies cost-effectively capture, quickly store, and instantly access selective transactions from tens- or hundreds-of-terabytes - whether the data is seconds or decades old." The "secularization" to which she refers could perhaps be better termed "productization" -- the CopperEye SDK has become a core technology in a consumer-friendly solution designed to provide a relational file system for transaction data using a small Tier 1 Unix or Linux server hardware footprint. (A port to Microsoft is in the works, too.)

The whiteboard illustrations of the Greenwich product roadmap say it all. The next major enhancements will introduce an appliance that will enable transactions, logs, and other structured data to be instantly collected from any database supporting Open DataBase Connectivity (ODBC) standards per user-defined policy rules. The data will then be placed by the Greenwich server into a flat-file archive (consisting of XML files on an open storage platform such as Sun's Thumper) where they will then be indexed and searchable by standard SQL queries.

Follow-on capability will include pre-defined and certified hooks into a range of business intelligence products, databases, and other business applications that will provide what Mitchell calls a "Write Once Database Solution" affordable to the masses. This horizontal product will complement nicely the OEM and specialty implementations for the core technology that are already finding their way into the telecom and government sectors.

In truth, fighting data sprawl must have as much value as fighting cell phone fraud and international terrorism. If your thinking agrees with mine, give CopperEye a look. Even if it doesn't, feel free to share your thoughts about this column:

About the Author

Jon William Toigo is chairman of The Data Management Institute, the CEO of data management consulting and research firm Toigo Partners International, as well as a contributing editor to Enterprise Systems and its Storage Strategies columnist. Mr. Toigo is the author of 14 books, including Disaster Recovery Planning, 3rd Edition, and The Holy Grail of Network Storage Management, both from Prentice Hall.