"Shark" Attack: IBM’s Disk Subsystem Preys on the UNIX/NT Market

On July 27, 1999, IBM announced their intention of recapturing a major segment of the disk storage market. From a high of 85 percent of the mainframe storage market ten years ago, IBM has seen their share drop to 20 percent. During this time the mainframe market for storage has grown but cost per megabyte has also dropped, eliminating their revenue growth. In the meantime, the NT market has grown at an astounding rate. To gain their share of the mainframe market and garner a significant piece of the NT and UNIX market, IBM has entered the fray with a disk subsystem, appropriately code named "Shark," and called the IBM 2105 Enterprise Storage Server (ESS).

The ESS is available in two models: the 2105-E10 and the 2105-E20. The E10 utilizes a single-phase power supply. Although it can be installed in office environments IBM recommends raised floor installations to ensure adequate cooling. The E10 server frame can contain from 420GB to 1.7TB of RAID-5 capacity.

The E20 utilizes a three-phase power supply, and requires a raised floor environment. The E20 primary frame can contain from 420GB to 3.7TB. The optional E20 Expansion Enclosure provides total subsystem capacity of up to 11.2TB.

Both models contain a storage server frame. It houses the Host Adapters (HA), two independent Cluster Processor Complexes (clusters), and some number of 8-pack disk drive modules (DDMs). Each cluster consists of four RISC SMPs, 3GB of cache, 192MB of NVS with a seven day battery, and four disk adapters (DA).

ESS Explained

The ESS is a cache-centric architecture, with all I/O operations going through the cluster cache. For write operations one copy of the data is written to the cache in one cluster, and a second copy is written to the NVS in the other cluster for integrity protection. The cluster cache is managed using adaptive record, partial track, sequential or full track staging algorithms. The NVS is managed using a least recently used (LRU) algorithm.

ESS also incorporates a proprietary Express Update algorithm that can aid large block operations. IBM claims that these proprietary algorithms can efficiently handle write operations that are 32K or larger. By treating a write as a RAID-5 full stripe write, rather than a number of large blocks, the parity operations are reduced, minimizing the number of disk operations.

IBM claims that the efficiency of the Express Update Algorithm in writing to disk enables the ESS to have a relatively small NVS. By writing a full stripe, the write penalty is avoided. Although this is true, this does not address applications that have a heavy random write content.

The ESS contains sixteen slots for Host Adapters (HA). Each connects to both clusters to enable either cluster to handle I/Os from any HA. All of these interfaces (including the disk adapters) operate independently, and are connected via PCI buses to one or both of the clusters via a switching technology. These switching cards provide cross-connection between the two clusters. In the event of a cluster failure, the switching hardware will automatically reconfigure the machine so that the host and disk adapters are connected to the other cluster.

The initial HA options will include ESCON and SCSI. Each ESCON and SCSI adapter includes two ports, providing up to 32 host connections. Fibre channel support is initially available via the IBM SAN Data Gateway (IBM 2108 Model G07) which provides FC attachment via the SCSI ports. IBM stated that it is previewing plans to support native fibre channel with one port per adapter and up to 16 ports per ESS, but did not provide an availability date.

The Disk Adapters (DAs) operate in pairs, one in each cluster. The disk arrays (or ranks) are attached to the DAs via 160 MB SSA loops with two simultaneous read and write data links. The arrays can be configured as RAID-5 or as non-RAID (JBOD). IBM states that the JBOD option might be used if the server operating system or application software does mirroring.

The RAID-5 architecture within the ESS offers two different array configuration options that are referred to as "8-packs." They are a 6D+1P+1S and a 7D+1P. The ESS configuration rules state that the first two arrays on a loop must be a 6D+1P+1S group. The remaining arrays can be either a 6D+1P+1S or a 7D+1P group. The spares are global and will be used by all arrays within the loop and by multiple cages within the loop if configured that way. Two spares enable a loop to have two disk device failures and still provide access to the data.

Each SSA loop operates at 40MB per second for a total bandwidth of 160MB per second around the loop. Each DA supports two loops, providing a total bandwidth of 320MB per second. With four pairs of DAs, the ESS has a total back-end bandwidth capability of 1,280MB per second.

Host Platform Support

Mainframe support includes OS/390, VM/ES, TPF and VSE/ESA operating systems. The ESS can be defined as up to sixteen 3990 control units, with each supporting 256 logical 3390 and/or 3380 volumes. Custom Volumes may also be defined in cylinder increments.

Open system support includes RS/6000 and SP2 running AIX, most of the major UNIX platforms, IBM Netfinity and other Intel-based PC servers running Windows NT and Novell Netware, and AS/400 running OS/400.

A set of logical devices is associated with a logical subsystem, and a single device adapter and a single cluster manage a logical subsystem at any given time. Each Fixed Block (FB) logical subsystem supports up to 256 logical volumes. AIX and Windows NT environments can use the IBM Data Path Optimizer to distribute activity across the host’s SCSI adapters and for path failover.

All standard configuration options include maximum cache (6GB) and NVS (384MB), and eight SSA device adapter cards. The options are categorized by IBM as ultra-high performance (9GB DDMs only), high-performance (18GB DDMs only) and capacity (36GB DDMs only) configurations.

ESS EX Performance for MVS

The ESS EX Performance Package consists of Parallel Access Volumes (PAV), Multiple Allegiance and Priority I/O queuing and was available on September 24, 1999. It is designed to improve performance by minimizing I/O queueing and pend time at the device level.

The Parallel Access Volume (PAV) option allows multiple UCBs (a base and aliases) to be assigned to each logical volume. This allows multiple I/O operations from a single S/390 server to access the same logical volume at the same time, reducing IOSQ time. PAV options include Standard and Dynamic.

With the Standard PAV option, reassignment is done through the StorWatch ESS Specialist. Dynamic PAV support is enabled when the OS/390 workload management (WLM) is in goal mode. It automatically manages the assignment of addresses.

Working in conjunction with PAVs, the Multiple Allegiance option allows multiple S/390 servers to perform multiple, concurrent I/O operations to the same logical volume.

With the Priority I/O Queueing option, the ESS uses information provided by the OS/390 Workload Manager to manage the sequence in which I/Os are processed. The system workload is tracked to see if various workloads are meeting their pre-defined goals. If they are not being met and the IOSQ time to an associated PAV is high, the WLM can cause aliases to be moved. Aliases are moved from a less important device, as defined by service class, to the problem PAV. Another procedure is to move an alias from a device with a low IOSQ time to a device with a high IOSQ time.

Other Performance Enhancements

A new track command reduces protocol and connect time. Essentially, more data can be written with fewer commands. PPRC on the ESS can now extend the distance between subsystems from 43km to 103km, via IBM 9036 Optical Multiplexors.

S/390 workloads tend to be cache-friendly and take advantage of cache-centric architectures. Open system workloads are oftentimes very cache- unfriendly and result in a high degree of back-end disk activity. The back-end architecture of the ESS, with its 160MB per second disk loops, should have a competitive advantage in open systems environments. However, users are cautioned to exercise care when mixing S/390 and open system or NT workloads on any subsystem, due to the tendency of the open systems to "flush" the cache.

The ESS will offer four software options for duplicating data:

• FlashCopy will enable a point-in-time volume level copy for SCSI and S/390 servers.

• Concurrent Copy does point-in-time volume and data set level copies for S/390 servers only.

• Peer-to-Peer Remote Copy (PPRC) is a synchronous copy for SCSI and S/390 servers.

• Extended Remote Copy (XRC) maintains asynchronous copies over extended distances for S/390 servers.

FlashCopy will provide a point-in-time copy of a volume. The process can be initiated via DFSMSdss for OS/390 or by a Web interface to the StorWatch ESS Specialist Copy Services for open systems and Windows NT platforms. Once a relationship has been established between the target and the source, a background task copies tracks from the source to the target. If an application issues a read request for a track that has not been copied, the data will be read from the source. Before a track on the source that has not been copied can be updated, the track must first be copied to the target. FlashCopy does not support resynchronization capabilities like the EMC TimeFinder and HDS ShadowImage products.

Storage Management Options

ESS includes several management tools that are part of the StorWatch family of products. They are accessed via the ESSnet private network (a required feature). It can also provide a gateway to a customer’s LAN to enable remote access and management capabilities. ESSnet includes an IBM workstation and monitor, an external Ethernet hub for connections to the ESS, and a modem and modem expander to allow communications between the ESS and IBM for service. The ESS Specialist software comes standard with the ESS. It allows an authorized user to monitor error logs, view and modify the configuration, modify and view communication resource settings (i.e., e-mail addresses and telephone numbers), and authorize user access. The user can view the external connection between a host and an ESS port, the internal connection of SCSI ports to the clusters and how storage space is allocated to FB and CKD volumes.

The ESS Expert is an optional software product for managing the ESS storage resources. It provides functions such as asset management, capacity management and performance management.

The optional ESS Copy Services product provides a Web-based interface for managing PPRC and FlashCopy functions. It provides panels to view and defines volumes as source or target volumes for PPRC, establishes relationships of all of the volumes of a logical control unit with those of another logical control unit, displays the current status of paths between connected control units, manages defined tasks, adds to or saves the existing configuration, and displays the error log.

It Has Some Teeth

With this announcement, IBM has made a major step forward in its effort to regain a leadership position in the mainframe and multiple platform storage market. The initial version of the product, with its limited open system feature/function is definitely targeted toward the mainframe market, with the major emphasis upon performance. The ESS EX Performance Package will be attractive to some mainframe customers.

However, several software features will be required before this product can be considered as a leader in the open system and multiplatform marketplace, such as non-synchronous remote copy for open systems. A major issue is the speed with which IBM responds with its promises of enhanced software features and functions.

It appears that the ESS has the potential to provide outstanding performance. Its bus bandwidth is also about equal to that of both the HDS 7700E and EMC Symmetrix. The ESS advantage is in the SSA back-end. This may be very significant in open system environments, which tend to have cache-unfriendly workloads.

Although IBM claims that the ESS can de-stage data out of the cache faster than it is arriving, Evaluator Group has some concerns with the cache implementation. One issue is the size of the read/write cache. The ESS has twice the capacity of competitive subsystems, but only 37 percent of the cache size (6GB vs. 16GB). The other concern is the potential performance impact with high write activity due to the relatively small NVS (192MB per cluster). This is significantly less than that offered by HDS (up to 11.2GB) and EMC (up to 12.8GB), and could be a limiting factor in cache-unfriendly environments, such as open systems and especially in mixed S/390 and open system environments.

Another issue that must be taken into account is availability, in lieu of the fact that the ESS is a brand new and relatively unproven architecture. As with any new product, especially one with the complexity of the ESS, it will take time for the subsystem microcode to stabilize. Non-disruptive microcode update capability is critical at this stage and should allow for patches as well as total replacements.

IBM has stated their intention to offer a virtual architecture in a future release. Although this will enable the ESS to emulate many of the attributes of the IBM RVA, we question the wisdom of this decision. Experience with StorageTek’s Iceberg and the RVA has seen a relatively slow time-to-market with new function, compared to the competition. We believe this is due primarily to the complexity of the code associated with the log structured file implementation of virtualization. As we look into the next century, being quick to market will be a major factor in successfully capturing market share.

About the Author: Dick Bannister is a Senior Partner at the Evaluator Group Inc. (Denver), an industry analyst organization focused upon storage products. He can be reached at (303) 221-7867, or via e-mail at dick@evaluatorgroup.com.