Scaling Storage in a Dot-com World
Special Report
Storage: Again with the storage. Such is the bittersweet mantra mouthed by many dot-com IT administrators these days. In an e-commerce space in which storage capacities are increasing faster than Federal Reserve-mandated interest rates, IT managers face the unenviable task of planning a storage infrastructure capable of scaling with the topsy-turvy pace of e-business.
As a rule, storage vendors are quite bullish on the long-term prospects of the information economy. Storage giant EMC Corp. (www.emc.com), for example, cites a recent study by the Marcar Management Institute of America Inc. (www.marcar.com) that pegs the current size of the global information economy at $3.5 trillion; by 2010, the Marcar report projects, the global information economy will have nearly tripled to $10.2 trillion.
According to Robert Gray, a research manager for storage systems with market research firm IDC (www.idc.com), storage vendors have good reason to be bullish. In Gray’s account, the mechanism that is today driving the global information economy -- the Internet -- is enabled largely by virtue of persistent storage.
"There is no Internet without persistent storage, which provides nothing less than the content of the information economy," Gray explains. "The Internet is only interesting if you can get access to content, and content always resides on persistent storage."
What this likely means for dot-com storage managers, most analysts agree, is that such problems of scale are only likely to get bigger.
IBasis Inc. (www.ibasis.net), a provider of IP telephony services, seems at once to embody both the agony and the ecstasy of today’s dot-com enterprise. IBasis enjoys robust growth, counts over 350,000 customer voice mail boxes, and has a lucrative telephony contract in place with MCI WorldCom Inc. (www.wcom.com). The company also has a storage infrastructure that has grown in the space of a few months from zero to 38 TB.
Search engine and Internet portal site Excite Inc. (www.excite.com) has rolled out more than 110 TB of EMC-based storage in just under 20 months, and free e-mail service Critical Path Inc. (www.criticalpath.net) has deployed 80 TB of storage in a six month span. Not to be outdone, Driveway Corp. (www.driveway.com), a site that provides free online storage and other services for end users, rolled out approximately 40 TB of storage within the space of a single month. The dot-com space abounds with zero-to-multi-TB success stories, and all indications are that the trend will only continue.
Modularizing Storage
For ventures like iBasis or Critical Path scaling an internal storage infrastructure to meet the requirements of the dot-com lifestyle is not easy.
As far as Ajay Joseph, director of network architecture at iBasis, is concerned, the key to developing a scalable storage architecture is modularity: One has to be able to plug a predetermined amount of storage into an environment without affecting system or application processes -- and also be able to remove it just as easily.
"It’s difficult to find out bandwidth profiles and user profiles, so we’ve kind of been in the midst of being conservative and too far out in terms of our sizing requirements for this particular project," Joseph acknowledges. "But the important thing is that we’ve designed the network so that it’s designed for redundancy and it’s been designed for a cell size of 175,000 users, so it’s very modular so that you can plug storage into it and take storage out of it as needed."
As IDC’s Gray points out, most storage vendors today are indeed thinking modularly.
"All storage today is modular," he explains, noting that in the case of organizations that experience fast-paced storage growth, vendors such as IBM Corp. (www.ibm.com) or EMC will sometimes "build" surplus storage into an environment. In this model, Gray explains, storage can be "turned on" -- and customers can be billed accordingly for storage capacity increases -- as demand dictates.
NAS
For much of its life, network attached storage (NAS) has toiled in the shadows of its bigger, ostensibly beefier cousin: the SAN. SANs are a key tool that storage managers have at their disposal in the race to architect scalable storage infrastructures, but in a dot-com space in which both modularity and plug-and-play expandability are key, dot-com ventures are giving NAS solutions a longer, harder look.
NAS defines an architecture in which a storage subsystem can effectively be plugged into a network. It’s that simple. NAS devices typically feature embedded operating systems that provide network- and file system-level support, which means that clients and servers of just about any ilk can transparently read from and write to NAS devices on a network. And because it can support a variety of network topologies and interfaces, such as 100 MB Ethernet or even Gigabit Ethernet, NAS can provide a fast, efficient and modular way to easily add storage to an environment.
IBasis, for example, selected a Symmetrix NAS solution from EMC in place of the latter company’s Connectrix solution. Citing concerns about manageability in the SAN space, iBasis’ Joseph says that a NAS solution also served to better complement the modular approach that his company was taking to storage.
"Given the amount of data that we actually anticipate from each of the hosts going to the EMC boxes we decided not to go with Connectrix," Joseph explains. "It kind of makes a lot of sense to have this type of configuration, because it fits in better with the overall modularity of our design."
Storage Vendors to the Rescue
Perhaps because they recognize that storage capacity planning and management is an altogether different enterprise in the dot-com space, storage vendors are increasingly become involved.
EMC, for example, features a branding program -- dubbed "EMC Proven" -- which certifies that e-commerce sites built on top of an EMC storage framework have also invested in best practices for enterprise storage. According to EMC vice president of product marketing Mark Vargo, acceptance into the program signifies that a company has implemented an IT infrastructure that can support the operational needs of an Internet-based business or service.
"If they’re EMC-proven, chances are that they’re not going to have the information infrastructure let them down," Vargo indicates, noting that EMC has already enrolled about 100 customers in the EMC-proven program. "When you’re EMC-proven, it’s a sign that you’ve taken professional services very seriously, that you’ve taken design services very seriously."
And to help dot-coms make sense of the sometimes bewildering array of storage options and solutions shipping today, many vendors also provide consulting and integration services that can help organizations design a scalable storage infrastructure. Such services demonstrate value by combining any of a number of different technologies, including SANs, NAS devices and conventional storage subsystems, into a scalable storage infrastructure.
One such vendor, NAS specialist MTI Technology Corp. (www.mti.com), offers a complete set of infrastructure design, backup-and-restore, storage management and storage consulting services for both enterprise and dot-com customers. Kevin Liebl, vice president of marketing at MTI, says his company’s consulting and integration services can help organizations avoid the pitfalls that all too often accompany capacity planning and storage management in the dot-com space.
"The biggest mistake that most customers make is that they don’t learn to think strategically about storage," Liebl explains. "They’ve got to look at open architectures and make sure that these are flexible, industry standard architectures so that they don’t box themselves into a corner, and that’s what we help them to do."
Distributing the Load
Perhaps the best way to ensure that your storage environment can scale to meet capacity requirements is to distribute it. Akamai Technologies Inc. (www.akamai.com) rocketed to both fame and a near-record setting initial public offering on the basis of such an idea last year.
The gist of Akamai’s approach involves intelligently caching content on a location-specific basis: Content with broad-based appeal is disseminated to servers distributed on a global basis, whereas content with limited appeal is distributed regionally. Intelligent caching technologies aim to make it easier for customers or Web surfers to access information at the same time that they shift the load from an organization’s centralized Web presence to literally hundreds of caching servers distributed globally.
"There’s a large suite of algorithms designed to get the content as close as possible, as efficiently as possible, to the end user, so the end user can always get a copy of the content they want from our servers nearby," explained Akamai chief scientist Frank Thomsen Leighton.
Intelligent content distribution was given a real shot in the arm, however, with Cisco Systems Inc.’s (www.cisco.com) late-March acquisition of SightPath Inc. (www.sightpath.com), a provider of appliances for creating intelligent content delivery networks.
As IDC’s Gray sees it, this type of approach for distributing content smartly parallels the way that commodities like books and magazines are today distributed on both a regional and a national basis.
"We have this complicated infrastructure already in place for books and magazines that replicates stuff with mainstream appeal and gets more focused stuff near where it’s wanted," he explains. "You’re seeing the same type of movement in the dot-com space with distributed caching, and Cisco’s movement into the market only serves to underscore that point."
Distributing the Database Tier
As e-commerce solutions evolve and become more sophisticated, some industry watchers suggest that dot-coms are going to have to rethink the very way that applications and relational database management systems (RDBMS) work in the first place.
The issue of faster, more scalable storage notwithstanding, Steven Anderson, CEO of Viathan Corp. (www.viathan.com), says an RDBMS, in particular, is designed primarily for use in an enterprise environment whose data storage and retrieval practices -- centered as they are primarily around relational data -- more often than not bear little resemblance to the nonrelational data generated by most dot-coms.
Viathan produces Leviathan, an intelligent load-balancing software technology for dot-com ventures’ database tier.
"First of all, the type of data that these dot-coms are collecting is not defined as relational data, because you rarely do the complex queries [and other such trademarks of enterprise RDBMS]," Anderson comments. "The best way to describe a relational database product is that it’s like a Swiss Army knife -- a thing that tries to be a fit for all environments."
Enter Leviathan, a technology that Anderson says "productizes" a lot of the custom work that dot-com’s have done in adapting RDBMS platforms for e-commerce environments. Leviathan provides a robust API that allows developers to write applications to a single virtual database that can consist of any number of component nodes or clusters with any amount of attached storage. As far as an application is concerned, however, it’s accessing a single database image -- no matter the physical location of the component database nodes or clusters.
"This is an API that’s specifically targeted to the stuff that they need as Web application developers," Anderson concludes. "It’s not the full ODBC having to wade through 3,000 plus commands, and it’s not the Swiss Army knife approach of the traditional relational database, so their time to market is greatly reduced."
[Sidebar]
Rethinking the Persistence of Storage
Viathan Corp.’s (www.viathan.com) Leviathan is an evolutionary approach that tackles the problem of dot-com storage on the application and database levels. Solid-state storage, on the other hand, challenges existing assumptions about both the role and capacity of persistent storage.
For its part, solid-state storage defines a method of storing data in a dedicated subsystem composed of physical RAM. Solid-state storage subsystems commonly consist of a battery backup and a fixed disk subsystem to which the solid-state storage device can dump its data in the event of a prolonged power failure or other system downtime.
Solid-state storage is not the same thing as an in-memory database, which attempts to cache the contents of a database in the physical memory of an RDBMS. Rather, solid-state storage is deployed on a dedicated subsystem that is both operating system and RDBMS independent.
Compared with their conventional brethren, solid-state storage subsystems are both more expensive and considerably smaller: It's unlikely that solid-state storage subsystems will be available in the terabyte range any time soon. Why, then, might solid state storage be considered a scalable alternative for the dot-com venture? The answer is simple: Speed. Solid state storage is faster than disk-bound storage and, in I/O intensive environments, can often dramatically improve overall performance.
"For some tasks, like e-mail or Web content delivery, you’ve got a small percentage of data causing most of the I/O," explains Mike Casey, vice president of marketing with Solid Data Systems Inc. (www.soliddata.com), a provider of solid state storage-based solutions. "You can take that 2 to 3 percent of the data and move it off of the mechanical disk and put it on something really fast. A lot of times you’ll see system performance as a whole dramatically ramp-up, too."
This was the case with Critical Path, which implemented a Solid Data Excellerator 800 storage subsystem to help it deal with a crippling SMTP message queue bottleneck. Critical Path’s storage engineers reasoned that if they could cache the contents of the SMTP message queue in a solid state storage subsystem, the multiple reads and writes that were causing the same system hosted on a conventional storage subsystem to become hopelessly backed-up would disappear. The result, Critical Path claims, is an eightfold increase in performance.
As Solid Data’s Casey positions it, solid state storage isn’t so much a replacement for conventional storage as a scalable complement to it.
"You put the bulk of your data -- that is, 95 percent or more of your data -- on hard drives in a cached disk array, but you move the really active stuff to solid state, and then you have the best of both worlds," Casey concludes.