In-Depth

Demystifying SANs and NAS: Which Storage Architecture Is Better for You?

Storage is Hot! Not as hot as the Internet, perhaps, at least in public perception. But the explosion of the Internet, the accompanying e-business revolution and the expectation that "everything" should be online all the time are driving an unprecedented demand for storage. One only needs to look at the growth in the market capitalization of companies like EMC, or ask data center managers how much new storage capacity they will bring online this year to understand the role of storage as part of the infrastructure of the Internet and e-commerce.

Of course, as demand rises, technological innovation surely follows. As technological innovation accelerates, user confusion over the number of products and technologies available increases as well. The storage market is no exception to this rule.

One of the primary areas of innovation in storage today is in the application of networking technology to storage connectivity. This has given rise to two new, and commonly confused, storage topologies: Network Attached Storage (NAS) and Storage Area Networks (SAN). Many people don't understand the difference between NAS and SAN. Some think of them as competing technologies; journalists occasionally depict a battle between NAS and SAN for storage technology hegemony, while investment analysts muse over whether NAS vendors or SAN providers are a better investment choice. Users often ask which they should choose as their storage architecture.

The truth is a little less dramatic, but for IT managers, even more exciting. NAS and SANs are both the products of the merge between storage and networking technologies. But, far from being competitive, they are in fact complementary technologies that productively coexist in many data centers and are even starting, in some ways, to converge. Taken together, they represent the future of storage: storage networking.

SAN Described

A SAN is, quite simply, a network dedicated to storage. More precisely, the Technical Dictionary published by the Storage Networking Industry Association (SNIA) defines a Storage Area Network as: "A network whose primary purpose is the transfer of data between computer systems and storage elements and among storage elements. A SAN consists of a communication infrastructure, which provides physical connections, and a management layer, which organizes the connections, storage elements and computer systems so that data transfer is secure and robust."

Unlike the traditional direct-attach storage model, a SAN attaches storage devices to servers in a networked fashion, using hubs, switches, routers and bridges to build the topology. Both the systems and the storage devices can, in theory, be heterogeneous in nature, though interoperability concerns limit some customers to building homogeneous SANs. Although the network could conceivably be built with any networking technology, Fibre Channel has emerged as the technology of choice for SANs.

SANs provide a number of advantages over direct-attached storage. They provide any-to-any connectivity between servers and storage devices, making possible the sharing of storage resources between multiple servers, and thus enabling IT managers to consolidate storage on a few large storage platforms. They also provide any-to-any connectivity between the storage devices themselves, opening the way for direct movement of data between storage devices, vastly improving efficiency of data movement and processes, such as data backup or replication. The use of Fibre Channel, or most any other networking technology proposed for SANs, enables longer connectivity distances and higher performance than currently possible with SCSI technology. Over time, SAN technology will ease the task of centralized storage management, and drive the adoption of remote management and data protection strategies, storage consolidation, system clustering and cross-platform data sharing.

The SAN market is made up of the vendors of Fibre Channel interconnect technology, as well as the vendors of the systems and storage devices that attach to the network. The Fibre Channel vendors are primarily new, relatively small companies, such as Brocade, Vixel, Gadzoox and Crossroads. The storage companies are the same ones that have been providing direct-attach storage for years, such as EMC, Hitachi, Sun, HP and Compaq, and it is no exaggeration to say that every storage company is involved in the SAN market.

NAS Described

NAS, on the other hand, describes file storage attached to a network. The SNIA's Technical Dictionary defines Network Attached Storage as: "A term used to refer to storage elements that connect to a network and provide file access services to computer systems. A NAS Storage Element consists of an engine, which implements the file services, and one or more devices, on which data is stored. NAS elements may be attached to any type of network.

When attached to SANs, NAS elements may be considered to be members of the SAS (SAN Attached Storage) class of storage elements. A class of systems that provide file services to host computers. A host system that uses network attached storage uses a file system device driver to access data using file access protocols, such as NFS or CIFS. NAS systems interpret these commands and perform the internal file and device I/O operations necessary to execute them."

Note that the SNIA's definition says that a NAS system may be connected to any type of network. This is an important future consideration. Today, however, NAS systems are generally connected to a local area network (LAN).

In common usage, a NAS system is a special-purpose device that is designed to serve files to clients over a LAN. The clients request access to files using standard Network File System (NFS) or Common Internet File System (CIFS) commands. NAS devices typically contain embedded processors hosting a specialized operating system, or microkernel, and a highly optimized file system, both designed to enable the NAS device to serve up files to clients with very high performance. Because they can serve multiple heterogeneous clients, NAS devices provide a form of heterogeneous data sharing.

Although the attributes of specific NAS products vary, NAS vendors generally attempt to adhere to the "appliance" model of computing. That is, NAS devices are designed to do one thing - file serving - and to do it very well. Moreover, they are typically designed to be very simple to install and configure. The storage they provide is often housed within the device's enclosure, though some NAS devices allow for the attachment of external storage.

The NAS market was pioneered by companies like Network Appliance and Auspex, which provide NAS systems for workgroup and enterprise customers. As the NAS market has grown, new vendors, such as Connex and CDS, are attempting to stake out niches in the mid-range and low end, while system and storage vendors, such as HP, Sun and EMC have also entered the market.

Can SAN and NAS Coexist?

So, SAN describes a networked storage topology, and NAS describes a highly optimized network file server. The questions asked by the IT managers, then, typically come down to some variation of the following:

• Can SAN and NAS be used together, or must I choose to base my infrastructure on one or the other?

• When do I choose which technology?

The first question arises because, just as NAS provides high-performance shared access to (file system) data, one of the promises of SAN is also to provide high-performance storage and data sharing. The good news is that the choice between SAN and NAS is not an either/or decision. SAN topologies and NAS devices do, in fact, peacefully coexist in many data centers. For example, a SAN in the data center may network together database and application servers with a number of large storage devices on which their data resides, while one or more NAS devices are attached to the LAN providing file access to clients.

The choice of which technology to use is driven mainly by the requirement being addressed, and partly by timing. If the requirement is to provide shared file access to a number of clients, NAS is generally the answer. NAS devices meet this need today with great efficiency. Because NAS systems are built on existing LAN and file system protocols, NAS technology is relatively mature in comparison with SANs. While a few SAN file-sharing solutions exist, they are generally aimed at specialized markets, such as video editing. Generalized SAN filesharing solutions will probably require a distributed SAN file system, which could be years away from appearing and maturing.

On the other hand, many IT managers are grappling with the need to consolidate data used by large databases or applications, such as Microsoft Exchange, onto a small number of shared storage platforms to improve centralized management. Or, they want to take advantage of device-to-device data movement for applications such as backup or data replication. In this case, SAN topologies can provide unique capabilities to address these requirements.

Will SAN and NAS Converge?

While SAN and NAS today are similar, but distinct, technologies, over time the lines between them are likely to blur. In fact, this process is already beginning.

This technology convergence will likely take two forms. The first, which is already underway, is the use by NAS systems of SAN infrastructures for their back-end storage. We noted above that while the storage capacity of many NAS systems is contained within the NAS device's enclosure, some NAS devices allow for the attachment of external storage. In fact, many NAS systems now have Fibre Channel ports, which allow them to connect into a SAN and enable the NAS file system to reside on a SAN device.

In many respects, this gives the IT administrator the best of both worlds. Clients requiring file access still get the performance benefit of a highly optimized file server. The IT manager can take advantage of the efficiencies of storage consolidation by placing the NAS file system on a shared SAN storage device. And the IT staff benefits from the plug-and-play features of NAS setup and administration.

The second avenue of potential technology convergence is somewhat more speculative. We noted above that the SNIA definition of NAS specifically allows for a NAS device to be connected to any type of network, including a SAN. For this to be meaningful, the SAN would have to be capable of carrying file traffic in addition to the block protocols, like SCSI, that it typically carries today.

This is, in fact, possible. Fibre Channel, for instance, is capable of carrying both SCSI and IP traffic simultaneously. This capability is occasionally exploited today, but mainly for the transmission of management commands to a device via IP. It is relatively rare for clients to use Fibre Channel as the interconnect for accessing file servers. While theoretically possible, few people advocate the use of Fibre Channel as a generalized messaging network technology.

Conclusion

We have seen that SAN and NAS technologies can offer the IT manager with distinct and complementary capabilities. SAN topologies offer the ability to consolidate storage and improve data protection and storage management processes with a dedicated, high-performance storage network.

NAS systems offer high-performance, low-administration file serving and file sharing for heterogeneous systems. Used together, they provide a potent one-two punch for addressing data center requirements.

About the Author: Scott McIntyre is the Business Line Manager for all data protection at Legato Systems Inc.

AT&T's Sorting Technology for Decision Support System

AT&T's Sorting Technology for Decision Support System

You've probably heard the story of the three blindfolded men trying to describe an elephant. One touches the elephant's tail, the other his leg, and the third his trunk, and each comes away with a totally different perspective of what an elephant is.

It's an apt analogy to describe how different departments in a company can view the same customer, depending on which part they touch. AT&T - with more than 90 million customers using a variety of services and packages that include long distance, cellular and Internet access - is a good case in point. A customer may make hundreds of international calls every month, use wireless services, spend hours on the Internet as a WorldNet subscriber, and have an AT&T calling card. Understanding the whole customer is a question of understanding the sum of those parts.

Processing Data for Decision Support

At AT&T Consumer Services, the Decision Support Systems group's basic function is to step back and get the total picture of the customer. "We match customer data with the various products," says AT&T programmer Dick Ehrnman. Such matching requires processing massive amounts of data from multiple sources.

"We're a big data warehouse," says Ehrnman. "We house and maintain information that interests the marketing systems group. They go in and take the information they want, and we make sure it's there when they want it." The data is culled from AT&T's massive provisioning, customer care and billing systems. It's also pulled from the AT&T switches - giant computers that process calls. In many cases, the data goes through several iterations before it gets to the Decision Support group, so formatting and sorting the information and putting it together in a way that's comprehensible can be a challenge.

To accomplish this, AT&T relies on SyncSort MVS, a high-performance sorting and data staging tool from Syncsort Inc. SyncSort works with other data warehouse components to accelerate warehousing tasks. Running on AT&T's Amdahl and IBM series 9000 machines, SyncSort speeds up such data staging chores as gathering data from legacy systems and formatting, converting and aggregating this information. This preprocessing accelerates database loads and indexing.

Merging Information

"Much of what I use SyncSort for is merging information from multiple files, where I have a record coming from one feeder and another from another system," says Ehrnman. "I know the information I need - a name and address, for example - is on different positions on different source system records, so I run it through SyncSort."

Because many business units use different customer identifiers, SyncSort helps put together the whole customer picture. "The long distance people responsible for monitoring and maintaining information on long distance customers may not always need other customer relationships, such as wireless service. But, other departments are interested in both," Ehrnman explains. SyncSort can merge input files with different record links.

When you're working with 170 million records - 31 gigabytes of data - sorting is a massive job that AT&T prefers to do outside of the DBMS. Using a dedicated sort utility helps Ehrnman make data extractions in minutes that would normally take hours. "It's really fast," he says, explaining that SyncSort only reads the bytes that are needed, while ignoring the rest.

The software also saves considerable time in read/write processing because it selects and reformats the data in the same operation and there's no file size limit, so it's not necessary to divide data into chunks that the system can handle. Ehrnman finds SyncSort's summarizing function particularly useful because "deduping," eliminating duplicative records, is performed frequently during data staging.

With enactment of Telecommunications Reform, the future promises continued change and fiercer competition. As it plays on a larger field, AT&T's success will increasingly depend on its ability to understand the whole customer. By mining the data that resides in multiple systems across its business units, AT&T will be able to see the whole elephant in the telecommunications jungle.


Must Read Articles