Improving Storage RAS - The Case for Migrating to RAID-6 Arrays
RAID-6 overcomes data-loss problems of RAID-5 arrays, but common hardware implementations are still needed to maintain performance.
Storage data is growing very rapidly, driven by reference data, multi-media applications, large databases, and legal and regulatory requirements. It is estimated that storage data is growing at an annual rate of 50 to 70 percent worldwide. In addition to capacity growth, there is a very high need for data reliability, integrity, and availability.
Disk storage is common in "active" storage (e.g., databases) and is increasingly being used for "passive" (backup and archival) storage. Disk storage is typically organized as RAID (Redundant Array of Independent Disks) arrays and RAID-5 arrays are most widely used. Due to increasing capacity needs, RAID arrays are growing in number and capacity of drives. Large RAID-5 arrays with high capacity disk drives are susceptible to unrecoverable data losses that compromise data integrity and reliability. Migration to emerging RAID-6 arrays substantially reduces the likelihood of data losses.
This article gives a brief overview of RAID-5 arrays and discusses the factors contributing to data losses. It presents a brief overview of emerging RAID-6 technology and examines how migration to RAID-6 arrays improves data reliability and integrity.
Disk-based storage is typically provided through RAID arrays to improve data access performance and reliability. Data is spread across all the disks in the array. RAID arrays provide data integrity and reliability by storing redundant information (aka check data) that is generated from the actual data. The check data is used to recreate the data blocks in case of errors in accessing data blocks.
There are several RAID array configurations possible. Among conventional RAID configurations, RAID-5 is the most common and most sophisticated. RAID-5 provides a good level of data reliability and availability at lower cost. RAID storage controllers, including hardware and software, are optimized for RAID-5 performance.
Notable key terms that characterize the reliability of RAID configurations include:
MTBF (Mean Time Between Failure): Rate of failure of a hard drive or a RAID array measured in number of hours. The collective MTBF of a RAID array is lower than MTBF of individual hard drives
MTTDL (Mean Time To Data Loss): Likelihood of a user data loss in a RAID array, measured in hours. Higher the better
MTTR (Mean Time To Repair): Time required following a failure to restore an array to its normal failure-tolerant mode of operation. This includes disk replacement and RAID array rebuild time
Figure 1 shows a RAID-5 array with four disks. Check data consists of XOR parity blocks that are rotated among all disks in the array.
RAID-5 can tolerate failure of any single disk. During a read operation, if any disk block is unreadable, the data request can still be fulfilled using the XOR parity blocks and data from the data blocks in the same stripe on the remaining disks.
RAID-5 Data Recovery and Data Loss
In a RAID-5 array, if any one disk fails completely, the RAID is said to be operating in a degraded mode. Each block of data has to be reconstructed using the corresponding data blocks in "good disks" and the distributed parity information. In this degraded mode, the performance of the array is much slower as the RAID controller has to perform additional processing disk accesses for each read/write operation.
When a disk fails, a standby drive is brought on-line or a good drive is installed in a place of the failed drive, and the array is rebuilt. During rebuild operation, the data blocks for the newly installed drive are recalculated using the data and parity blocks in rest of disks. Each block of each good drive is read and each block of the replacement drive is written with the corresponding recalculated data.
In an array with a large number of high-capacity disks, the rebuilding process can take several hours. During rebuilding the RAID controller has to fulfill normal read/write requests impacting RAID performance.
RAID-5 arrays are vulnerable to unrecoverable read errors leading to data loss in degraded and rebuilding modes. Some key factors contributing to data losses are:
Media Defects: Hard drives are susceptible to media defects (aka grown defects). Some defects are present in the media at the factory during manufacturing. Factory defects are corrected by mapping defective blocks to reserved or alternate blocks. During normal use, over a period of time, media degradation occurs resulting in media errors. These grown defects increase with age of drives and are detected during read operations. Drives can tolerate a limited number of grown defects before failure. Background auto-scrub functions performed by a RAID controller or the disks themselves will find and fix many -- but not all -- of the grown defects. There is still a chance of finding a grown defect at an inopportune time, for example, during a RAID-5 rebuild operation (see Figure 2).
Figure 2: Data Loss in RAID-5 Array
Due to Failing Disk with Latent DefectsClick to enlarge
Mechanical failures: Hard drives are susceptible to mechanical failures due to worn mechanics and head skips. Desktop-class drives are more susceptible to mechanical failures compared to enterprise-class drives. Sometimes age and wear of mechanical components can compromise the tolerances of an older drive, which can result in existing defects which once were recoverable becoming unrecoverable.
Environmental: Hard drives can experience sudden failures due to environmental conditions such as physical jostling of drives and fan failure. These conditions generally impact multiple drives.
RAID-5 can recover from any of these failure modes in isolation. But it is the combination of a mechanical disk failure and discovery of a media defect on a surviving disk during the subsequent rebuild where RAID-5 fails to recover user data.
To illustrate, consider a RAID-5 array with six disks comprising of 400 GBytes SATA disks. Typical unrecovered read error rate for this class of drives is 1 in 1014 bits, which means one unrecovered read error for every 10,000 GBytes read.
Rebuilding such an array involves reading 2000 GBytes (5 x 400GBytes), which implies that there is a likelihood of one unrecoverable read error for every five rebuilds. Thus, the MTTDL of this RAID-5 array is about 8,335 hours; lower than the MTBF of a single disk, which is about 10,000 hours for this class of drives. The bottom line: a RAID-5 array with many high-capacity disks is more likely to encounter a data loss!
The MTTDL decreases with more and/or bigger disks. The impact of such an unrecoverable data loss increases with the criticality of the data being stored. For instance, in a database application, data loss in a single data block may not be tolerable and could be catastrophic. In contrast, storage applications where multimedia files are stored may tolerate some failures.
Migrating to RAID-6 Arrays
One can prevent unrecoverable data losses in a RAID array by migrating to a RAID-6 architecture. RAID-6 configuration builds on RAID-5 and can tolerate simultaneous failure of up to two data disks. In this configuration, check data equivalent in capacity to two data disks is maintained, meaning the capacity consumed for check data in RAID-6 is twice that of RAID-5.
During RAID rebuild due to failure of a single disk, even if an unrecoverable data block is encountered, using the second redundant block of check information, the data block is recovered preventing a data loss. To illustrate, using the same example given above, with the same class of disks configured as a RAID-6 array of capacity 2000 GBytes, it can be shown that MTTDL is about 150,000 hours which is an improvement by a factor of 18. The MTTDL improvement is more dramatic in larger arrays.
RAID-6 is an emerging RAID technology with several implementations. Two examples are two-dimensional redundancy and MDS (Maximum Distance Separable) coding.
The architecture of two-dimensional redundancy based RAID-6 array is shown in Figure 3. In brief, there are two types of XOR parity blocks generated: Horizontal XOR parity and Diagonal XOR parity. The Horizontal XOR parity and Diagonal XOR parity for a given range of data (called a stride) is located on separate disks. The resulting pattern is rotated for each successive stride in the array to avoid accessing any one disk for every write operation.
The other method uses MDS codes. MDS codes are well known in the data communication field. A special class of MDS codes (called Reed-Solomon) is used in disk drives to protect individual data blocks. MDS coding uses the mathematical concept called Galois field arithmetic. Two check data blocks are generated using the coding method. The MDS coding and decoding are mathematically intensive, requiring hardware acceleration to maintain performance. In addition to hardware support, firmware and application level support is required to fully implement RAID-6. Several IOP silicon vendors, HBA vendors, and External storage vendors are beginning to look closely at RAID-6. Commercial products supporting RAID-6 are expected to be available soon.
RAID-5 is a well-known and commonly used configuration in storage applications. RAID-5 is susceptible to unrecoverable data loss errors. The propensity of data loss increases in large arrays with large-capacity drives.
In critical storage applications (such as databases), data losses cannot be tolerated. Migration to RAID-6 mitigates these issues while adding only a tiny overhead. By migrating to RAID-6, storage solutions can continue to scale up using inexpensive desktop-class hard drives.
RAID-6 is emerging as the next generation RAID technology, with several offerings from prominent vendors on the horizon. There are several approaches to RAID-6 implementation ranging from purely software based to hardware acceleration. Since RAID-6 is computationally intensive, software-based RAID-6 solutions do not perform well, driving the need for standard hardware solutions.