In-Depth
Why We Need a Data Management Summit
The thought of a vendor association dictating a schema for how companies should characterize their data is a frightening one.
Recently, the Storage Networking Industry Association (SNIA) announced that it was forming a vendor group to come up with a classification schema that could be applied universally to name data and facilitate information lifecycle management. The coverage in Computerworld a couple of weeks back may have just reflected a slow news day at the magazine, since SNIA forums tend to devolve into long-winded coffee clutches that produce little of anything really useful.
For example, SNIA’s work on developing quasi-standards around continuous data protection a year ago resulted in several “birds of a feather” meetings among prominent storage vendors that, in the end, produced general agreement on a definition for the word “continuous.” Major accomplishment, that.
Still, as I read the article, the thought of a vendor association dictating a schema for how companies should characterize their data raised the hair on the back of my neck. I heard a little voice say, “Be afraid. Be very afraid.”
I have already begun to see vendors introducing their own “foo” into characterizations of storage platform tiers—that is, generic classes of storage devices to which data will be targeted as part of an ILM scheme. For example, there is this slide that is part of virtually every ILM presentation offered by little vendors seeking to partner with Hopkinton, MA-goliath, EMC. The slide, which summarizes storage targets for data management, offers the term “primary storage” (usually illustrated by an EMC Symmetrix, IBM Shark, or HDS Lightning array), followed by the term “secondary” or “near line storage” (with an EMC Clariion, an IBM LSI Logic re-brand array, or HP/Compaq StorageWorks array), followed by term “reference data storage” (SATA arrays are used to picture this one), followed by… Suddenly, a creepy little picture appears in the architectural model labeled “Centerra.”
Centerra? Since when has Centerra become class of storage unto itself? This is a proprietary EMC platform featuring a proprietary controller that is used to tag data with a bit of hash so it can be tracked over time—but only by other platforms with the same proprietary controller (e.g., more Centerra.) Centerra is a classic vendor lock-in: a Japanese Mortgage that requires you and three generations of your kids to pay off the note. How the heck did that become a generic category of storage targets?
That’s what happens when vendors start creating categories for things. They invariably find ways to define their own products as a separate class. Like HDS calling TagmaStore a “paradigm shift” in Big Iron arrays: it may have some extra bells and whistles, but in the final analysis, it’s just bigger iron—soon to be rivaled by bigger iron from HDS’ competitors.
But back to SNIA on data classification. You put a bunch of vendors into a room and ask them to come up with a generic data naming schema and it is virtually guaranteed that they will come up with some pretty self-serving data class definitions. I wouldn’t be surprised to see classes like these:
- Data that you want to hide from Record Industry investigators (e.g., bootleg MP3 files). This special category needs RAID arrays on wheels so they can be moved into a closet quickly when RIAA guys show up at the door looking for illegal copies of Britney Spears videos. Per this classification, it is best to connect the arrays using a Fibre Channel fabric so they can’t even be seen using SRM packages.
- Data that you want to plausibly argue you deleted by accident (e.g., ENRON financial records). This special category should be put on virtual-volume software that has a reputation for abending when you go to resize the volume, an event that has the habit of deleting all file handles on the existing volume, deleting data irretrievably.
- Data that you should place in a Big Iron array (e.g., all structured and quasi-structured data). This special category has its target-hosting platform already defined in the classification, which is kind of neat, but with a Gartner-sanctioned differentiator to indicate whether monolithic or modular big iron should be used.
- Data that needs to be accessed by 10 or more people (e.g., data produced in any company with more than 10 people). This category was dreamed up to help small- to medium-sized businesses pick “strategic” hardware, instead of small storage devices with a smaller profit margin.
- Data that requires n+10 redundancy (e.g., all data). This special category requires point-in-time mirror splitting support, plus frequent nightly, weekly, and monthly backup to tape, so that the vendor can sell you more capacity on a frequent basis.
The list could go on—and doubtless will—since SNIA has so many smart vendor marketing guys working the problem.
What we need is a Data Management Summit (which we are planning for May 2005 as part of Networld+Interop in Las Vegas) that will allow end users to compare notes on the data classification schemes they have home grown in their own shops. By comparing a few dozen of these, we would likely see commonalities emerge that could become the basis of a REAL data classification schema.
I doubt that the three-letter-acronym vendors want us to do this. As you begin to analyze the categories of data that you are producing, you inevitably discover that you have been using the wrong platforms to store your data on all along. Such a realization might push consumers to start purpose-building storage instead of buying whatever a vendor tells them to buy.
So maybe that is what’s behind SNIA’s next big thing? Your comments are welcome at [email protected]
About the Author
Jon William Toigo is chairman of The Data Management Institute, the CEO of data management consulting and research firm Toigo Partners International, as well as a contributing editor to Enterprise Systems and its Storage Strategies columnist. Mr. Toigo is the author of 14 books, including Disaster Recovery Planning, 3rd Edition, and The Holy Grail of Network Storage Management, both from Prentice Hall.