In-Depth

Why Is RAID Dying a Slow Death?

While the technology is on the decline, here's why RAID is still sticking around as it slowly marches towards its ultimate demise.

Back when I started on my journey in the world of IT, RAID -- Redundant Array of Independent Disks -- was considered a state-of-the-art gold standard and we paid dearly for the privilege of having servers with RAID adapters to meet our data protection desires.

My, how times change!

Today, RAID is still in use, but a survey of the storage landscape -- particularly among emerging technologies -- reveals that it is either considered a secondary data protection mechanism or doesn't even hit the spec sheet for some products. Why is the use of this once hot technology now considered a mortal sin in some storage circles? There are two primary reasons.

First and foremost, one of the more common RAID levels -- RAID 5 -- began to show serious weakness as disk sizes continued to grow ever larger. Today, there are disks on the market that are a whopping 5 TB in size, which is massive by the standards of the era in which RAID was born. Back then, RAID adapters could rebuild the relatively small disks of that era relatively quickly. That is, when a disk in an array failed, it didn't take too long to rebuild the failed disk. However, as disk capacity continued to increase, the amount of time that it took to rebuild failed disks also increased. The problem: During a rebuild, there is additional stress on the whole array as bits are gathered to rebuild the lost disk. As such, the potential for a double-disk fault increases. A double-disk fault is a situation in which a second disk fails while a rebuild is underway for another one. With RAID 5's ability to suffer the lost of only a single disk, a double-disk fault results in the loss of all data in the array.

Further, it becomes more and more likely that, during the rebuild, the organization will encounter an unrecoverable read error. Unfortunately, disks aren't perfect and they will have flaws. Every disks comes with a metric known as Bit Error Rate. This Bit Error Rate provides customers with general guidance related to the reliability of disks. There are three common Bit Error Rates used for disks:

  • Cheap, low cost. Bit Error Rate = 1 in 10^14 = 12.5 TB
  • Moderate, moderate cost, Bit Error Rate = 1 in 10^15  = 125 TB
  • High quality, higher cost, Bit Error Rate = 1 in 10^16 = 12.5 PB

It's overly simplistic, but think of it this way: Pretend that the BER is something that you will hit exactly at the levels described above. So, for that cheap SATA disk in an array, as soon as it's hit that 12.5 TB mark, it experiences a hardware failure-based read error. Bear in mind that even a 1 TB disk can hit 12.5 TB during its lifetime as new data replaces old.

To combat these issues, general guidance has been to move to RAID 6, which offers two disks worth of parity protection, or RAID 10, which provides mirroring. Both are viable, but RAID 6 introduces its own performance challenges. Every time data is written to a disk in a RAID 6 array, six separate I/O operations are required. In addition, there will come a point when even RAID 6's dual-disk parity scheme is rendered useless thanks to increasing disk size.

It will take a long time to say goodbye to RAID and it will really always be with us in some cases, but there are growing challenges around common RAID levels that are pushing storage makers to turn to alternative data protection mechanisms.

 

About the Author

Scott D. Lowe is the founder and managing consultant of The 1610 Group, a strategic and tactical IT consulting firm based in the Midwest. Scott has been in the IT field for close to 20 years and spent 10 of those years in filling the CIO role for various organizations. He's also either authored or co-authored four books and is the creator of 10 video training courses for TrainSignal.


Must Read Articles