In-Depth
Allocation versus Utilization
Storage efficiency is more than the allocation of bits to disk—it must also consider data utilization.
According to my good friend and long-time storage smart guy Randy Chalfant, who lately hails from StorageTek, there is an unfortunate confusion in some people’s minds between storage allocation and storage utilization. Here’s the boggle.
Q: How much of your storage is being utilized efficiently?
A: We have about 80 percent of our storage capacity allocated to applications X, Y and Z.
Q: That isn’t what I asked. How much storage are you utilizing efficiently?
A: And I said, my capacity utilization is at about 80%. That’s pretty efficient considering what some analysts say about typical utilization: 40 percent in UNIX and Microsoft environments, 20 to 30 percent in Linux environments. Heck, I am even beating average mainframe DASD utilization levels of about 60 percent.
That type of exchange occurs almost daily between storage solution providers and their customers. The problem is more than one of semantics: allocation and utilization are often used interchangeably in common parlance, even though they mean different things. The real issue is the use of the modifier: efficient.
Efficiency in storage utilization means more than simply using up most of your capacity before you buy more gear. That is more properly termed “allocation efficiency,” since we allocate capacity to meet the needs of specific application storage requirements.
Yet, we are all being told that that type of efficiency—allocation efficiency—is the Holy Grail, something that will happen as a result of all of the development efforts in storage management and virtualization technology finally come to fruition.
Allocation efficiency is also the key component of Fibre Channel SAN vendor hype: they tell us that simply by placing all of our direct-attached storage in a SAN, we can allocate it more efficiently to the applications that need it. That’s an interesting proposition, but it doesn’t work unless all array LUN maps are built using the smallest common denominator (the disk drive), and it doesn’t work unless we do away with our expensive controllers and let some third party virtualization build our virtual volumes.
LUN carving and splicing technology do not yet exist, except within the narrow context of “basic disks” in Microsoft Windows 2000 environments according to Redmond guy Mark Licata, who adds that similar functionality will be coming out in the next release of .NET. Mark sent an e-mail to point me to links describing splicing technology in Microsoft environments.
He wrote, “If the storage hardware platform (controller array) can add LBNs to a LUN, we can use DISKPART to grow the NTFS partition on the fly. We have back-ported this tool to Windows2000 also. (See http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=0FD9788A-5D64-4F57-949F-EF62DE7AB1AE.) [Additionally, the link below] is to [a] description of DISKPART Command-Line Utility for XP Home/Professional/64Bit Edition, it's in the box. (See http://support.microsoft.com/default.aspx?scid=kb;en-us;Q300415).”
Even if Microsoft is on the cutting edge of LUN splicing and carving, allocation efficiency, in the final analysis, isn’t the real problem in storage. As Randy Chalfant boldly submits, it is efficient utilization. Utilization efficiency refers to how the data that is being stored is actually being used. Utilization efficiency represents an order of efficiency above and beyond that of allocation efficiency.
Chalfant correctly observes that even the most efficiently allocated storage is very inefficiently utilized. This has to do with the likelihood that data, once stored, will ever be re-referenced.
Data, once stored to disk, has a tendency to be forgotten. It’s likelihood to be re-referenced drops by 50 percent in just three days. After a month, it is almost certain that the data will never be referenced again. Yet it sits on disk drives forever where it reduces utilization efficiency of disk to something like 30 percent.
Think about this the next time that Mr. Big Array Vendor comes to your shop and argues that you are buying capacity at about $1.50 per gigabyte. The vendor usually wants you to buy the terabyte you need, plus enough additional capacity for seven rotating mirrors that can be used to recover the whole TB of primary data at any time on any day of the week. At the end of the day, you are paying for space to hold eight copies of virtually the same data and manifesting a steadily decreasing storage utilization efficiency.
Randy says that StorageTek is cooking up some sort of solution to this issue. Watch this space.
About the Author
Jon William Toigo is chairman of The Data Management Institute, the CEO of data management consulting and research firm Toigo Partners International, as well as a contributing editor to Enterprise Systems and its Storage Strategies columnist. Mr. Toigo is the author of 14 books, including Disaster Recovery Planning, 3rd Edition, and The Holy Grail of Network Storage Management, both from Prentice Hall.