Three Spring Projects for Storage Managers

Spring cleaning is a reminder to get busy with long-delayed projects, including several projects we ought to be initiating today to improve our storage infrastructure for the rest of the year.

Though it may have gone unnoticed, especially with the recent unseasonably cold weather in the Northeast and Midwest, spring officially sprung on March 19 (or 20, if you are particular about the astrological significance of the equinox). In Florida, where I live, the weather has already settled into a sort of warm mugginess typical of the season. Air conditioners are running full blast, partly for comfort, but also to screen out the pollen in the air. It won’t be long before you smell orange blossoms everywhere.

Traditionally, the advent of spring triggers a variety of activities: planting flowers and vegetables, cleaning out the clutter from months of indoor activity, swapping out winter clothes for lighter-weight attire, just to name a few. For fitness club operators, spring almost always brings a surge in traffic as the desperate rush to shed their winter flab in time for swimsuit season.

This column, however, isn’t about seasonal change or social behavior. It’s about data: how it is provisioned with infrastructure, how it is managed once written, and how it is protected throughout its useful life. Spring simply provides a metaphorical context for several projects we ought to be initiating today to improve our storage infrastructure for the rest of the year.

Project 1: Clean Out Your Storage “Gutters”

Just as leaves and pine needles tend to find their way into roof gutters during the fall, becoming an ugly sludge of biomass that blocks rain runoff and causes roof problems when the April showers begin in earnest, chances are very good that storage infrastructure pathways are gummed up after six months or more of standard operating procedure. It's surprising how all of those small changes of equipment and people over the last few months have altered the reality behind the grand design envisioned when you first deployed those NAS arrays or created volumes and zones in your FC fabric or IP SAN.

Think about it. If you have been busy migrating storage between different boxes during the winter—to allocate additional capacity to finance and accounting, for example, as they performed their end-of-year accounting and auditing—then chances are that you have a lot of inefficient volumes and port assignments in your infrastructure today.

Why not take some time to review your data transport plumbing: the networks that interconnect end users and application servers to NAS arrays and IP storage targets, the channel fabric connections to storage arrays, and the servers that access them? If inefficiencies present themselves, you should know that it won’t take much effort to reorganize things a bit. Doing so might just help defer the need to purchase new switches and arrays. I suggest you look into Computer Associate’s SAN Designer for some tools to help you do the job.

Project 2: Clean Your Closet

The 2005 fiscal year is over for most companies, so there may well be some temporary data—multiple iterations or versions of reports and perhaps even interim data marts—that you can dump to tape or delete altogether. Clearing the clutter and de-duplicating your data sets can breathe new life into applications and help to buy back disk space that could be put to more productive use.

For database archiving, look to GridTools, Princeton Softech, or OuterBay (just acquired by HP), or to your own database management software, which usually provides archiving utilities that most folks don’t even know about. Carve out the data that is infrequently or never referenced, keeping its logical structures intact, then migrate this archival warehouse to cheap SATA arrays (if you must keep it “nearly” online) or off to tape.

Similarly, data from “electronic content management” systems (also known as document management systems) can also be archived and redeployed to “ghetto RAID” or tape. Most of the document management systems out there, ranging from the monsters such as Documentum and Filenet to smaller products that I like , such as Xenysys (which, by the way, has been beating behemoths like Documentum recently in several key bid contests), have archive tools that are pretty good. Use these tools to clear out the old data to make room for new—without buying a lot of extra space.

E-mail archive systems are a dime a dozen these days. Computer Associates’ Message Manager (formerly iLumin) and over 100 others can be installed reasonably quickly to automate the archiving of Exchange mail or Notes messaging systems. It’s far more legal than my preferred pre-SOX-days alternative: declare a virus attack and just delete all of your e-mail. I used to love doing that every year—clearing out my mail backlog and buying back a lot of wasted disk space in the process. Alas, it is no longer legal.

User files don’t cooperate readily with some sort of intelligent archive scheme, so you may want to try something different. Set up a global namespace (essentially some virtual file folders in cyberspace) and direct users to store all their files going forward to a “folder” that you designate. This is the beginning of data management applied to files. It isn’t foolproof, but at least it helps to get data sorted into some sort of category or classification system—even if it is by user or by department or workgroup.

Simple HSM software can then be applied to the data to migrate it off your most expensive disk based on simple characteristics such as date last accessed or date last modified. It isn’t perfect, but it works. Check out Arkivio or Caminosoft, or one of the many other distributed storage HSM plays out there. You can find one for nearly every size of infrastructure and wallet.

Project 3: Plant a Data Garden

While there is no Farmers Almanac to help you know exactly when to begin, your data garden doesn’t really need it. Start by walking through your data center. Get a feel for where the storage subsystems have been deployed in their cabinets or racks. Look at the switches, compression or encryption appliances, and all of the other storage-related gizmos that you have deployed tactically over the year(s). Consider which equipment generates the most heat and where it might be better placed in your physical plant to help dissipate heat (near an AC return for those units that exhale a lot of BTUs). For very dense racks, consider either moving equipment around for better airflow or (worst case) investing in some heat exchangers (yes, they are available for distributed storage components at $54K a pop from Hewlett-Packard). Heat is the chief enemy of disk storage and helps to drive up your electricity costs as well.

Also, while data doesn’t need light to grow, you shouldn’t keep it hidden away in a dusty closet either. Take inventory and move storage arrays into clear view. You may be surprised where you discover storage capacity you didn’t know you had.

Make a list of all the tools you are using to monitor the status of your storage infrastructure—everything from point software that came with a specific box to software utilities you purchased, downloaded from the Web, or written yourself. When your list is complete, shop around for an alternative to the quiver-of-arrows approach you have been using. There are several storage resource management (SRM) tools on the market today but few takers for most of them. When you think about it, a single tool suite is preferable to a conglomeration of scripts, provided the tool offers the same or better functionality at an acceptable cost.

The availability of solid SRM tools, such as Tek-Tools Storage Profiler, CA BrightStor, CommVault, and others I’ve seen in that crowded market, is not the same as saying that there are good suites for comprehensive application- or data-facing storage management. Real data-facing storage management suites must deliver best-of-breed SRM utilities for managing infrastructure (including, for example, discovery and monitoring services, virtualization services, global namespace services, and so on), but they must also deliver capabilities for understanding application requirements by “listening” to the data they produce, for automating policy creation and implementation, and for extracting value from infrastructure in the form of consistent provisioning and protection services. That kind of storage management remains the holy grail today and is not delivered by any single vendor.

I doubt that the storage industry is able to build the data-facing components of the storage management stack. The Storage Networking Industry Association (SNIA) has been working on it for years, if you read their marketing missives, though their statements are confusing. ILM, according to one paper from the group, will have to wait until all of the plumbing issues of storage are worked out (aka, until FC SANs actually work in a heterogeneous deployment). This gives the industry a long time before it can actually deliver the goods on any sort of data-facing management utilities. Keep an eye on open-source efforts, such as, which is trying to tackle the problem head on as an end-user initiative.

These spring projects should keep you busy until summer, which, in Florida, and other Gulf Coast states, signals the start of hurricane season. We dread the thought that another Katrina, Rita, or Wilma will come along this year and undo all of our spring planting. But that’s a column for another time.

Your gardening tips are welcome at