Storage Architecture
There is a problem with data: It just seems to accumulate faster and faster despite the best storage management efforts. New enterprise applications such as customer relationship management (CRM), enterprise resource planning (ERP), and e-commerce are all contributing to the explosive growth of data. Web sites are using click-stream analysis tools to collect Web hits at the mouse-click level, and then streaming this data into decision support databases that allow marketers to see -- almost in real time -- how users are interacting with dynamically generated Web sites.
Let’s not leave out the unstructured files tossed over the wall to hundreds or thousands of file servers out there. Word processing documents, spreadsheets, PowerPoint presentations, screen cams, and engineering diagrams are ballooning in size and quantity. As a result, IDC estimates storage device sales will more than triple between 2001 and 2003.
Not long ago, you could put up an NT or Novell file server and let users get at the data through the network. But as the user load grows, general purpose operating systems accessing disks through a SCSI bus can bog down. More and more IT shops are finding that a single server can become a bottleneck, unable to handle the expanding volume of data or the increased number of incoming data requests.
The symptoms are easy to define. Users call to complain about inadequate performance. Routine monitoring reveals that the I/O subsystem is close to peak throughput. Disk write time, average queue length --especially average write queue length -- and other disk parameters are all up, indicating the disk subsystem is overloaded. Backups take longer than they used to. The amount of spare disk space is rapidly shrinking. Larger portions of the server CPU time is spent waiting for I/O to complete.
Once a system has reached this state, the system administrator must decide how to reconfigure the system to maximize database and file system performance. To solve this problem, he or she needs to deploy a fast, scalable storage architecture to capture, manage, and control all the data. Enter the storage area network (SAN).
A SAN does three things to help the storage management problem. First, it is a dedicated, intelligent device that can service data requests from multiple requestors. A dedicated server is usually able to handle these requests faster than a general-purpose operating system that is not optimized for data management. Also, it offloads the processors on the requestor side from managing the disk I/O, improving overall performance.
Second, most -- but not all -- modern SANs connect the storage devices to the SAN server using fiber optic cable. This significantly boosts I/O response time and throughput. The fiber can be extended off-site, allowing storage management to take place in a centralized, secure data facility.
Third, the members of the SAN can use the fiber network to communicate with each other. This allows the system to move chunks of files around on the system and still keep track of their location when a request comes in. SANs can also be linked together to provide disaster recovery through real-time data replication.
What are the issues you need to think about when considering implementing a SAN? Most of the solutions are proprietary and there are no widely accepted interoperability standards. The Storage Networking Industry Association (www.snia.org) is working on standards but nothing has come out yet.
A SAN doesn’t eliminate the problem of managing backup and recovery. Despite advances in automation, the storage management process still requires significant manual oversight. In fact, some organizations spend as much as 60 percent or more of their IT budget on storage management.
Another consideration is security. Since the SAN environment is relatively new, there hasn’t been a lot of attention paid to security. If someone gains unauthorized access to a file server, they have access to the data on that server, but not the data on other servers. If the SAN is hacked, the individual has access to all data stored on the network.
The good news is you’ll be able to deploy a scalable architecture that can handle the largest data loads. The bad news is you’ll have to invest heavily in technology and personnel to feed the beast and keep it running smoothly. Considering the alternative of poor response time or unavailable data, this may be a price worth paying. --Robert Craig is vice president of strategic marketing at Viador Inc. (Burlington, Mass.), and a former director at the Hurwitz Group Inc. Contact him at robert.craig@viador.com.