In-Depth
The Big Gulp Theory
Who does the second-guessing is not always who pays the piper.
Chicago was enjoying very moderate weather when I visited there recently, one fellow observing that it was compensation for a winter deep freeze that had persisted well beyond the normal seasonal calendar. Despite the pleasant temperature, however, in the mind of one IT manager, the “winter of his discontent” had not given way to “a grand and glorious summer”—at least not in his shop.
Following a merger in late 2004 of his company (a large consumer products manufacturer) and a much larger competitor, the IT manager (who preferred not to be named here for reasons that will become evident later—we’ll call him Ed) found himself tasked to implement a new strategy focused on storage and server consolidation. Specifically, he was charged to combine and relocate two data centers into a collocation facility, which he described as a “bunker,” near Chicago.
The plan was simple. All data from the combined companies would be hosted in a Fibre Channel fabric, providing the means to scale capacity over time as the volume of data grew, and to enable a common backup infrastructure. A one-time field engineer for a technology vendor and a veteran of many fabric deployments, Ed was confident that he could do the job with a minimum of operational interruptions for his company.
His current storage infrastructure included array products from seven different vendors. Despite this heterogeneity, he believed that he could build a stable infrastructure. “The secret,” he said, “is to use the most generic setups in each component of the SAN. The most generic switch functionality, the most generic array controller configurations.” That way, the proprietary functions built onto the arrays from different vendors wouldn’t get in the way of their interoperability in the same fabric. “Keep it simple” was his mantra.
In total, he needed to consolidate 22 TB of storage in the FC fabric, and to backup about 16 TB of this amount using tape. Problems began to arise when decisions were made for him about the gear that would be deployed at the new data center.
He reports that his strategy included the acquisition of a new array to accommodate data expansion. His preference was for an HDS Lightning array, a product with which he had enjoyed considerable success in the past. Among other features, the array’s capability to virtualize its own ports, enabling 190-plus logical connections to each physical port, was a big advantage. It simplified the connection of servers to storage and the zoning of the fabric itself.
However, the plan was compromised by someone higher than his manager on the company's organization chart, someone who was convinced by the HDS sales team that he needed to “supersize” his array by deploying TagmaStore instead of Lightning. Their pitch: Why settle for a large drink when you can have a Big Gulp for just a little more money?
“There’s nothing wrong with TagmaStore,” Ed said, “and I’m sure it’s a fine product when you have applications that need it. But ours don’t. And spending more money on more capacity and functionality than we need or could ever possibly use makes no sense to me.” Truer words were never spoken, but no one in management wanted to listen.
Ed explained that the real problem needing to be addressed was the poor design of applications that had been developed internally at his company to handle numerous phases of order-taking, manufacturing, and order fulfillment. Created by developers with no understanding of the realities of hardware capabilities, these applications forced the selection of high-end products (both servers and storage) as a result of their wasteful and excessive resource (memory and storage) utilization.
Frustrated, Ed insisted that money spent on TagmaStore (and on the inevitable training regimen that his staff would need to be able to configure, manage, and use the new product with its complex crossbar switch technology) could be much better spent on a code overhaul. Fixing the applications would subsequently allow him to economize on storage purchasing by reducing resource requirements, and would play a significant role in limiting the expense of storage to the company in the future.
His words fell on deaf ears. TagmaStore was installed at the bunker and Ed turned his attention to the migration of data to the new array and to others that have been deployed to the fabric at the new facility. That’s when the next issue developed.
Ed believed that data should not be transferred to the new infrastructure without first making a comprehensive and validated backup— a sober precaution. He performed the system-wide backup on the eve of the scheduled migration and, to his chagrin, discovered that backups of most Microsoft servers had failed.
“Backups of Exchange and Microsoft file-server data didn’t complete successfully with the software we were using,” he said. (This was his primary reason for attending the event where I was speaking in Chicago, by the way.) “I’m a UNIX guy and I’m used to writing backup scripts that simply execute without thinking about them. The Microsoft environment seems to have a lot of issues with backups that I don’t fully understand. From what we can see, no backup vendor has a comprehensive capability that spans the multiple versions of UNIX that we use, and also [offers] Microsoft support.”
Bottom line: Ed was instructed by management to do the data center move without the safeguard of a complete backup—they were willing to take the risk of what might develop into a full-blown disaster. "One guy told me, 'Hey, the disks are RAID 5. Just yank them out and drive them over to the new site.'" Clearly, in Ed’s view, they didn’t understand the potential disaster they were inviting.
Frustrated once again, he did what he was told. Fortunately, no data was lost in the piecemeal transfer, but he was left with a bad taste in his mouth … and probably with an interest in pursuing employment elsewhere.
His case underscored for me not only the gaps in technology—such as the vicissitudes of heterogeneous SANs and the limitations of backup products—but also much deeper issues of technology decision-making and risk management. At a time when many IT professionals are willing to abdicate their responsibilities to purpose-build infrastructure based on application requirements and to outsource their acquisition decision-making process to vendors, Ed is an anachronism.
During the dinner event in Chicago, he was the only one at my table who insisted that the application should determine the infrastructure choice. For this view, he endured a bit of chiding from other IT managers at the table, who insisted that such thinking was out of step with current realities in business IT. To a one, they argued that buying a Big Gulp was better than buying a right-sized drink of storage, regardless of price, especially if it saved them the time and hassle involved in analyzing and characterizing the actual storage requirements of the applications that create and use data.
Such analysis is hard work, they observed. It requires time and resources that are in short supply when undertaken correctly, assuming that anyone had a methodology for doing the analysis correctly. Besides, no one ever gets fired for buying IBM, HDS, or EMC, one manager observed.
The not-so-subtle message was clear: to these fellows, they weren’t spending their own money, just the money of their employers and company stockholders. Ed was becoming too invested in his own calculus of technology’s business value, and such nobility of purpose was bound to make his job impossible.
As for the need to protect the data asset during such error-prone events as data center relocation, the group was mostly silent. Ed received no suggestions regarding backup products that would produce guaranteed results in a Microsoft environment. Everyone shared their war stories, but few solutions emerged. His disgruntlement with having been overruled and ordered to make the move without a comprehensive backup met with sympathetic shrugs.
The common wisdom of the group seemed to be that in an imperfect world where non-technical senior management routinely usurps the authority of their on-staff technology advisors—by purchasing hardware “solutions” presented to them by vendor sales teams (and already been reviewed and rejected by IT management) or by overruling IT’s procedures or strategies on a whim—one needs to pick one’s battles carefully.
Stupid choices are made every day by senior managers, said one IT professional, especially given their preference for hardware over software: “You can sell a new DMX array from EMC; it’s a big, pretty, shiny box. But try to sell management on Computer Associates' BrightStor or AppIQ’s storage management software products. It is a lot more difficult to get buy-in on intangible things like software.”
Still, you need to wonder: who would have taken the blame if important data had been lost in the data center relocation? Would it have been the senior manager who ordered it against the advice of his IT manager, or would it have been Ed? Hopefully, Ed observes, he will never need to learn the answer.
Your comments are welcome: [email protected].