Blog

Hands-On Testing and Analysis

All About Data Protection Part 6 – Nested and Combination RAID

Raid 61

While the basic RAID level concepts addressed some of our concerns about disk drive performance and reliability they are all really just the most basic building blocks of data protection. Users, and vendors, needing RAID sets bigger than, or with higher resiliency than, the standard RAID levels can deliver have built some amazing combinations to address their particular problems.

The wide variety of combination and compound RAID systems has led to some significant online argument about how various RAID levels can and should be combined. The truth, as usual, is more complicated than most hobbyist bloggers understand.

Nested RAID

Nested RAID is the sequential application of two or more RAID technologies most commonly RAID1, RAID5 or RAID6 for resiliency with RAID0 data striping for capacity and performance. This nesting is typically some combination of hardware, or disk driver, RAID like Intel’s RST technology providing the resilience and an operating system volume manager spanning or concatenating those RAID volumes together.

Nested RAID used to be an important part of the enterprise storage admin’s toolkit. I made use of it myself the first time I encountered an EMC Symmetrix. This being the dark ages before Fibre Channel I had to connect my Windows servers to the Sym with 68 pin high voltage differential SCSI cables. This lead to me letting the magic blue out of two Adaptec SCSI cards learning the difference between HVD and the single-ended SCSI Windows servers came with.

That old Sym used a proprietary RAID scheme EMC called RAID-S. Rather than striping data across multiple drives and presenting a 27GB LUN RAID-S allowed hosts to directly address the three 9GB data drives in each stripe while still calculating and storing parity data on the fourth drive of the set. Since the Sym was full of 9GB drives all it could present was 9GB LUNs. In order to get the 40GB I needed to setup Exchange I used the Windows volume manager to stripe the M(for mail): drive across five of those 9GB LUNs.

As storage arrays added more storage virtualization features like compound RAID (see below), dynamic volume resizing and thin provisioning nested RAID, like other host storage virtualization features has been fading out though the limited functionality of most cloud storage options may see storage virtualization succeed in the cloud in ways it never did in the data center. It remains a useful tool when dealing with basic RAID and storage systems with limited LUN virtualization features.

The Great RAID10 vs RAID01 Debate

Mirrored, striped RAIDsets were the destination of choice for applications that needed a lot of small I/Os per second like OLTP databases throughout the disk RAID era from homebrewers to Lehman Brothers.

For some reason more virtual blog ink has been spilled explaining the supposed advantages of RAID01, sometimes called RAID 0+1, which creates a pair of stripe sets and mirrors all data between them and RAID10, sometimes RAID 1+0, which creates mirror sets and then stripes data across them.

The Most Basic Nested RAID 1+0 and 0+1

The long and the short of it is that RAID10 and RAID01 behave and perform just about the same under normal conditions. For the case of very basic implementations of RAID1 and RAID0 the RAID10 implementation will be very slightly more resilient.

When one of the X drives in a basic, or shall we say stupid, RAID01 implementation fails the entire stripe set that drive is a member of, or 1/2X, drives will go offline with it. A failure of any of the other 1/2X drives will result in catastrophic data loss.

In the RAID10 set each drive is mirrored to a doppelganger. Once a drive has failed the array can still survive the failure another drive as long as it’s not that one that contains the same data as the failed drive.

The means the RAID01 set has a probability of data loss after a single drive failure of:

P(DataLoss)=1/2X * P(DriveFailure)

or for a 10 drive set

P(DataLoss)=5*P(DriveFailure)

Where for a RAID10:

P(DataLoss)= P(DriveFailure)

For any size RAIDset.

Of course enterprise storage arrays use a single data placement engine that can deliver the resilience of RAID10 in multiple ways.

RAID50, RAID60 and Other Common Combos

While it’s theoretically possible to build a parity based RAIDset of arbitrary size the I/O amplification on small writes and other complexities we’ll explore in a later episode make smaller RAIDsets more desirable for many applications.  RAID50 and RAID60 stripe data across RAID5 and RAID6 parity sets to provide enhanced capacity and performance.

RAID on RAID 11, 51, 61

Nested RAID can also be used to provide higher levels of resiliency by mirroring data, rather than striping it, across multiple RAIDsets. This is most commonly used by single controller storage arrays to provide resiliency in the event of a controller failure, these systems, including most first-generation virtual storage appliances (VSAs) synchronously replicate data between two controllers each of which maintains a RAID5 or RAID6 set.

Combination RAID

Nested RAID systems implement the two or more layers of RAID that are independent of each other, probably provided by different products like a hardware RAID controller and VSA that mirrors two RAIDsets. Combination RAID systems manage data placement holistically combining parity and striping for example into a single data placement algorithm. Combination systems can make up for the failings at one layer, that a RAID5 set can only survive 1 drive failure by stitching the drives from both sets of a RAID51 mirror to survive even if neither RAID5 set survives alone. Most enterprise storage systems today actually use some sort of combination RAID.

Enterprise RAID10 is just about the simplest case of combination RAID. Rather than keeping track of the intermediate abstractions of RAID0 and RAID1 sets the system simply builds a map that records LBAs 0-X of LUN Y are stored on drives 43 and 44 starting at LBA Z. With this sort of direct mapping, the system will continue to deliver data unless both drives holding some set of data go offline or otherwise fail to respond with the requested data.