Data Warehousing and SANs: Not Quite Ready for Prime Time
Ask any storage vendor to describe a suitable scenario for the deployment of a Storage Area Network (SAN) and most will answer almost automatically, "Data warehousing." Data warehouses are conceived as huge repositories of information updated frequently via large-scale movements of data from online transaction systems to some centralized data storage platform. Ultimately, this repository is to be shared among a growing cadre of end users across the company, who will dig, pan and sift through the bits using data mining tools on a quest for non-intuitable relationships that may guide business decisions.
One can almost hear the checklist items click off in the vendor’s mind: "Hmm. Large-scale data movements. A dynamically scaling storage repository. Sharing access among countless end users. This looks like a clear-cut case for a SAN."
SANs might, indeed, eventually provide a suitable storage infrastructure for applications, such as data warehousing. A SAN comprises a "back-end network" – separate from the production LAN and used exclusively for I/O data traffic. In theory, it is ideally suited for any application involving large-scale data transfers that, if handled across the production LAN, could negatively impact the performance of other applications that transact their business through the same fixed bandwidth.
Moreover, SANs promise to deliver a scalable storage platform. Need more storage space? Simply add more resources to the SAN. Whether the new storage components consist of a just-a-bunch-of-disks array from Joe’s JBODs or a Symmetrix platform from EMC Corporation makes little difference. The SAN will simply present an additional resource for use by applications that require it.
Finally, SANs will deliver the ultimate data sharing solution. Open SANs, once perfected, will enable heterogeneous hosts to access data universally, in accordance with a SAN operating system (as opposed to a network or server operating system’s file systems).
As a side benefit, the entirety of the SAN infrastructure will be highly manageable. A couple of competent IT staffers, assisted by automated storage management tools, will be able to manage the storage infrastructure and groom the data that resides there. Total cost of ownership will drop precipitously – well below the current one dollar per megabyte, per year estimated storage management cost promulgated by industry analysts.
It sounds almost too good to be true. And, for now, it is.
The current primordial generation of SAN technology fails to deliver on just about any of the promises of future open SANs. This has not stopped marketers from hyping the technology as a panacea that will cure all storage management woes within corporate IT.
John Snyder, Director of the Data Management Practice for reseller/integrator Champion Computer Corporation, understands the limitations of current SAN technology. While enthusiastic about the potential of SANs, Snyder provides a sobering analysis of the current capabilities of SANs and where they play.
SANs: Not Child’s Play
IBM’s number one business partner in serial storage architecture (SSA) sales for many years, Champion, earned special distinction in 1999 for selling the vendor’s first SAN solution. The customer was Tutor Time Learning Systems, a leader in the childcare and preschool learning systems industry. Tutor Time turned to Cisco Systems, Champion and IBM to build a scalable infrastructure to facilitate the company’s fast growth, says Snyder. Delivering the storage component of the solution – a SAN based on IBM storage hardware, Snyder notes, was hardly child’s play.
Snyder recalls that SANs were all the rage in 1999’s industry trade press. Their promise of ubiquitous data access and enhanced storage management with reduced cost ignited the enthusiasm of IT practitioners and vendors alike.
From the beginning, SANs were spoken of almost synonymously with Fibre Channel, a 10-year-old gigabit-speed interconnect and protocol that was "rediscovered"by the industry and introduced to the storage market in late 1998. According to SAN advocates, by connecting storage arrays and tape products to a "back end," Fibre Channel-based storage network, a storage "pool" or "utility" could be created. If universal connectivity was supported by this storage pool for all servers and workstations, and if the storage pool itself could be made intelligent and self-managing, a raft of storage-related problems could be easily solved. For example, through intelligent SAN switching, tape backup data transfers could occur through specified ports of the SAN switch, while other data movements occurred through other ports, without causing interruption or conflict. The problem of shrinking back-up windows would just go away.
In addition, SANs solve the downtime and maintenance problems associated with server-attached storage. Based on his past experience as a Director of IT Systems for a medical insurance company, Snyder says, "This total cost of ownership impact, this storage management impact is one of the [most compelling arguments] for a SAN. Just compare the cost of storage management today with the cost of a SAN, and you can usually make a pretty convincing case for SANs."
However, Snyder is quick to caveat his enthusiasm for SANs with a reality check. He notes that current SANs are homogeneous and "work only in either an all-UNIX or all-NT environment." Moreover, SANs are still waiting for service management utilities to fully realize their storage management cost reduction potential, which Snyder believes is a year to 18 months away.
These facts haven’t stopped IBM or other vendors – from industry market share leader, EMC, to lesser-known, but equally as aggressive players, such as XioTech Corporation – from jumping on the SAN bandwagon. In late 1998, EMC advanced its own view of the SAN in the form of Enterprise Storage Networks (ESN) and formed an alliance of hardware and software companies dedicated to providing EMC-branded SANs into the enterprise and open systems space.
According to Don Swatik, Vice President of Product Management for EMC, the company’s strategy is to provide less visioneering and more engineering. "The [open systems] world is gravitating to the model of centralized storage that existed in the mainframe world for many years, in order to realize economies of scale. ESN is the next generation. It enables storage to be extended beyond the 13-meter limit imposed by SCSI to the 1- to 10-kilometer distances made possible by Fibre Channel."
Phil Soran, President and CEO of XioTech Corporation, whose acquisition by Seagate Corporation was announced in late 1999, acknowledges, "EMC is kind of open, but at a different level of economics." He notes that the future of SANs is to provide universal compatibility within the switched SAN fabric for a broad range of storage products from a broad range of vendors. Until such products are available, he is content to portray his intelligent storage arrays as "SANs in a box."
"Inside our arrays, we provide a Fibre Channel switch, RAID controller, logical volume manager and data management/ data mover tools that provide virtualized, centralized storage. We can carve up pieces of physical drives to provide a logical volume image to each server."
Eric Herzog, Vice President of Marketing for Mylex Corporation, a manufacturer of storage controllers that was acquired by IBM in 1999, acknowledges that SANs have some significant ground to cover before open platform data sharing is realized. External RAID arrays are part of the trend forward.
"The real world is about improving controller I/O. The greater the expense of [tethering storage to] servers, the more that companies try to place storage outside of the server itself. We are seeing increased movement toward externalization of storage, even in the Intel space. This, in turn, is driving improvements in the features and functions of controllers: more host/drive connections, mirroring cache memory, cache coherency and SAN mapping. Fibre Channel and SANs provide the means to connect more storage and to increase the distances between storage and servers, increasing implementation flexibility."
What continues to be missing from SANs, says Herzog, is application awareness. SANs need automatic mechanisms that will provide the right kind of storage (RAID level, physical block layout) to optimize application performance, particularly in the case of data warehouses and other database-based apps.
Sue Smith, Director of Marketing at CrosStor Software (formerly Programmed Logic Corporation), agrees with Herzog. She adds that a storage-centric operating system is required before open SANs can happen. Multi-purpose server operating systems are too complex and administration-intensive, Smith contends, and operating system ownership of disk volumes must be broken before file sharing (as opposed to disk sharing) can be realized. "If sharing does not occur on the file level, then it isn’t a SAN."
Smith says that the big news of 1999 was not SANs at all, "1999 – and, perhaps, 2000 as well – will be known as the year of the NAS device." By NAS, Smith refers to Network Attached Storage devices: disk arrays equipped with "thin server" operating systems that attach directly to a company’s IP network. NAS products from vendors, such as Auspex Systems, Network Appliance and Procom Technology, would seem to run counter to the trends in storage centralization represented by SANs and large, external disk arrays, but Smith argues that this is not true.
Smith, whose company makes NAS thin operating systems, claims that NAS devices address immediate needs for plug-and-play file storage space, but also set the stage for SANs. "NAS devices are easy to deploy and scale very well. They have a huge, near-term market. By 2001 or 2002, SANs may come into greater use. When this happens, NAS devices may play a role in managing SANs."
Smith says that current-generation NAS OS products from CrosStor provide storage-optimized operating system environments enhanced with support for file system protocols, such as the UNIX Network File System (NFS) and Microsoft’s Common Internet File System (CIFS). "We just strip out the stuff you don’t need in a multi-purpose operating system and add extensions, such as SnapShot, Remote Mirroring and Volume Mirroring capability."
The initial advantage of NAS device deployment may be quick-and-easy "elbow room" for exponential data growth. However, within the next 18 months, says Smith, thin storage servers may evolve into something much more significant. They may serve as access and control points for SANs.
"Hybrid NAS/SAN servers will handle the metadata traffic associated with storage I/O requests. The actual data transfers will occur within the high speed SAN network itself," says Smith. NAS products already have the requisite operating system features and functions to support access to shared data. By making them gateways for SANs, many of the inherent management and control problems that are inhibiting current SAN development are resolved, Smith argues.
She says that enthusiasm for the approach is building among RAID vendors, who are planning to add CrosStor’s SAN OS directly to their external controllers. In contrast to this approach, several other companies, including Mercury Computer Systems and MountainGate Imaging Systems, have fielded first generation SAN OS products running as ethernet-based applications to provide file access controls in a shared storage environment. Collectively, these approaches comprise an "out-of-band" SAN management mechanism, requiring that a second connection be made to SAN devices (Ethernet) in addition to the primary SAN network connection (Fibre Channel).
The alternative to "out-of-band" management is "in band" management: manage SANs via the same switched fabric used to create them. This is the objective of the Fibre Channel Community’s Storage Network Management Working Group, which has, thus far, failed to produce an "in-band" management solution. The Storage Network Industry Association (SNIA) has listed SAN management as one of the key missing components that will need to be addressed before SANs will be ready for prime time in the corporate world.
Technology hurdles aside, vendors see another challenge to SANs: media hype. At a recent IBM conference, business partners heard IT analysts remark on the importance of channels to the selling of SANs. Analysts claimed that the revenue projections for Storage Area Networks in the coming year were contingent upon channel sales performance. It was agreed that vendors alone either, (a) lack the skills and knowledge in their direct sales force to articulate the SAN value proposition and to define SAN solutions effectively to customers, and/or (b) lack the time required to make a SAN sale. Channel partners, resellers and integrators, presumably enjoy much more "face time" with the customer and may combine the knowledge of networks and storage required to present cogent and persuasive information about SANs.
At the same time, many resellers have vocalized concern about referring to SANs as SANs at all. So much press has surrounded the Storage Area Network – including verbal conflicts between vendors of different SAN concepts and products – that some customers are afraid to become early adopters of the storage architecture. Those who are willing to embrace SANs are often reluctant to invest in the products of any vendor that does not seem to have a clear "building block" approach to the future that will protect their SAN investment. This is lacking from the proprietary SAN solutions offered by most vendors today.
For this reason, many vendors – as well as large consulting and integration firms – have backed away from calling a SAN a SAN. Instead, in the words of one integrator, "We will go into a customer account to solve a problem. If the solution is a SAN, we will use it. Not because it is a SAN, but because it solves a particular set of problems."
CrosStor’s Smith gets the final word on SANs in 1999: "The first generation SANs appeal to companies with centralized management requirements, big capacity requirements and homogeneous server access requirements [such as audio and video post-production]. Next generation SANs will need to move from specific, application-oriented products to a much wider market of general purpose storage platforms." She, like just about everyone else in the storage industry, believes that the year of the SAN is still a couple of years away.
Back to Data Warehousing
The current state of the SAN hardly qualifies it as a panacea technology for data warehousing applications. Numerous analysts confirm that, at present, SANs are not moving much beyond the pilot stage in most businesses. Indeed, many issues need to be resolved before most companies can make investment protected technology acquisitions in the SAN arena.
Some of the questions that remain go to the core of current thinking about SANs – namely, its use of Fibre Channel as the SAN backbone or interconnect. IDC’s Vice President for Storage Research, John McArthur, wonders whether Fibre Channel will be able to become a de facto SAN interconnect standard in the face of challenges from other network technologies. "Look at the way that the bandwidth of Ethernet has advanced over the last decade. You have to question why companies would choose Fibre Channel for a backend storage network, rather than the gigabit ethernet that they already have deployed everywhere else."
McArthur’s question is quietly echoed in many corners of the industry. Sources at chipmaker Broadcom Corporation, indicate that LAN switch makers are pursuing their own designs for SANs, based on gigabit Ethernet switching and I/O-optimized IP. This strategy, apparently, also makes sense to several leading vendors, including 3COM Corporation and Adaptec, who dropped off the Fibre Channel bandwagon early on.
With such core issues as the SAN interconnect still in doubt, one doesn’t require data mining tools to discover that a correlation exists between slow SAN deployments and still-evolving technologies. Once jelled, however, SANs may provide a perfect fit for data warehousing storage.
About the Author: Jon William Toigo is an independent consultant and author of The Holy Grail of Data Storage Management (Prentice Hall). He maintains a storage-focused Web site at www.stormgt.org, and can be contacted at firstname.lastname@example.org.