Christmas Comes Early to Data-Management Vendors

Vendors—full of glee over new federal evidence rules for evidence—may be overstating the need for their products.

To listen to vendors of data archive, search and retrieve, data classification, and content indexing software, Santa has come early this year. A recent set of amendments to the Federal Rules of Civil Procedure have put a fire under all companies to take immediate action to sort through their storage junk drawers and to put more management discipline into force to retain and preserve their electronic data in a highly searchable way.

However, by all accounts in the legal press, the change made at the start of December to the statutes that govern, among other things, rules of evidence in federal courts such as the United States District Courts, Courts of Appeal, and Supreme Court, is a comparatively minor one. The recent amendment simply expands the definition of evidence that has persisted since the rules themselves were first set down in the1930s—to include electronically stored information (ESI), giving it the same meaning as good, old fashioned paper records.

Contrary to the hoopla in vendor marketing spiels, the amended rules—as yet untested by an actual case—state that counsel can ask for production of all ESI that the other party may use to support its claims or defenses. Lawyers are entitled to “all reasonably accessible data,” which is basically limited to data that does not require “an undue burden” to collect or review. Active data in databases or file systems would qualify under this rule; in some cases, so would offline data or data in storage if it is indexed and easily retrievable.

Under the same amendment, parties are not required to produce ESI that is not reasonably accessible because of undue burden or cost. This broad category could include unlabelled backup media, old data that is no longer machine readable using current generation software or hardware, and “deleted” data that might persist in some fragmented form on a piece of media but that could not be retrieved without extensive forensics. In theory, an inventory of such data may need to be disclosed together with an explanation for why its delivery is not possible or practicable in response to a discovery mandate.

There are laws in certain states that modify these e-discovery rules for state courts, but the new federal rules the vendors keep talking about are simply not as radical or revolutionary as everyone says. That hasn’t stopped Larry Cormier, senior vice president of corporate marketing at Scentric, in Alpharetta, GA, from being delighted by the amendment. He says that the new rules contribute to a growing demand for the “universal data classification” technology promised by his product.

“The December 1 amendment finally establishes that digital data is a record for legal purposes. Digital data ought to be managed as records on an ongoing basis,” says Cormier.

“Ought to be” is the reason why I agreed to cover Scentric in this column. Cormier stops short of joining the chorus of vendors claiming that the Federal Rules change mandates or requires data management. To him, the most exciting thing about the rule is that it applies to everyone, not only to a specific industry segment, such as health care with HIPAA or SEC rules with brokers, or to publicly traded companies, as in the case of SOX.

The Rules amendment that has everyone in the industry so ebullient does not mandate ongoing data classification or file, e-mail, or IM indexing and archiving. Rather, the new rules kick in only after notice has been served of an impending legal action. From the lawyers I’ve consulted, the argument that many vendors are making—that you must manage all data in an archive or index because you never know when a lawsuit might happen—is simply incorrect. No such mandate exists in the amendment.

As a general rule, only when litigation begins (or in many cases is threatened), a company cannot intentionally destroy potential evidence. Under the amendment as it currently stands, there are no stated sanctions for the unintentional loss of ESI as the result of the “routine, good-faith operation of an electronic information system.”

Universal Data Classification

What vendors are hoping for are legal precedents that will define what “routine, good-faith operation” means. They are hoping that the definition will embrace the latest technologies for data classification, indexing, and archiving, hardwiring requirements for companies to implement data-management schemes into their day-to-day operations.

Such a definition—with its inherent mandate—does NOT exist under any realistic interpretation of the Federal Rules today. That hasn’t stopped the industry from leveraging the announcement of the amendment to create fear, uncertainty, and doubt in the minds of consumers.

Scentric hasn’t contributed to the hype thus far. The company has been spinning its own story, trying desperately to differentiate itself in a market where concepts are airy and terminology slippery in the extreme. Their product, Destiny, claims to provide “universal data classification.” Cormier took pains to explain what this means and how it differs from the “classification engine” statements in the marketing of products like Kazeon, Index Engines, FAST, and others.

“Kazeon [is integrated] with the Google unified search engine, too” Cormier notes, after citing his company’s just-announced partnership with Google. “We think that, while interesting, this is just an extension of search functionality that might be good for electronic discovery in unstructured files, but it isn’t data classification.”

Nor, he says, do the enterprise search technologies provided by FAST or Index Engines contribute any real data-classification capabilities, which he defines as “providing synchronization of metadata with content searching—not just looking for words in combination, but also the owner of the file, its location in the storage infrastructure, its creation date, and other metadata attributes.”

Search, if I understand him correctly, applies classification criteria after data is stored and assumes knowledge and skills on the part of the human being operating the search engine. By contrast, data classification applies criteria to data at the time it is stored (either to primary or retention storage) in order to create a business process-centric, enterprise-wide metadata catalog.

With data classification done the Scentric way, using its Destiny product, classification schemas (or “taxonomies” in Cormier's parlance) are developed before data is written. He says that Scentric has begun creating standardized taxonomies for vertical industries and specific regulatory mandates, and that over 80 of these templates are available for free with his software.

Destiny installs as software on a generic Windows OS server. It is positioned “out of band” (not in the data path itself) where it is used to create a metadata catalog(s) of what gets stored to disk. As a rule of thumb, each catalog maintains pointers to between 10 to 20 terabytes of data. You deploy another server and software instance for each additional 20 TB data load, but the users see one big catalog.

According to Cormier, the software installs in two to three hours, and Scentric bundles in three days of implementation services, including design, consulting, and training. Cormier says that this is more than sufficient to get Destiny operational in most customer environments.

Scentric has promised to refer me to consumers who are using the product today to classify their data; but to be honest, I continue to bristle at their claims regarding “universal data classification.” Current metadata provided by file systems is insufficient to properly classify data to any truly useful level of granularity. However, some sort of segregation of the data junk drawer, even a low-level one keyed to file owner, is certainly a start.

I look forward to chatting about this more with Scentric Destiny users. If you are one, or would like to comment on this piece, please send your e-mail to

Must Read Articles