Blog

Hands-On Testing and Analysis

The Data Protection Diaries Part 3 – Parity RAID

Raid 3

In this installment of As the Disk Drive Turns, we’ll explore RAID levels two through five and the math(s) they use to protect data with less overhead than mirroring.

RAID2 and RAID3 Bits and Bytes Not Blocks

RAID2 stripes data across multiple drives at the bit level using Hamming codes. RAID3 uses parity but at the byte, not block level. Both require that the disk drive spindles be synchronized and have fallen out of favor. Once LBA (Logical Block Addressing) replaced HTS (Head, Track, Sector) addressing and disk drives got track buffers bit and byte wise RAID just didn’t make sense.

The only RAID2/3 implementation I can remember seeing was a Compaq prototype for what became the IDA (Integrated Drive Array) card in the first real x86 server, the Compaq SystemPro. This machine required special drives from Conner Peripherals where Compaq was a major investor.

RAID4 – A Dedicated Parity Drive

Ninety percent of the time when storage folks talk about single parity RAID, we simplify the conversation by talking about data drives and parity drives, which is exactly how RAID4 operates.

A RAID4 system with n drives writes data strips to n-1 drives and performs an XOR (exclusive or) of all the data strips to the remaining parity drive.

RAID4 Data Layout

Protection overhead for RAID4 systems ranges from 25% for a 4 drive system with 3 data and one parity drive (3D+P) to as little as 6% for a 15D+P arrangement. That means any given set of drives will hold at least twice as much data with single parity RAID as in a mirrored set.

RAID4 systems provide good read performance as the system can read data from all the data strips independently.

RAID4 systems bottleneck on the parity drive when challenged with many small writes. That’s because as every write, regardless how small, requires updating the parity drive. RAID5’s rotating parity is designed to address this.

As long as writes to the RAID4 set are the same size, or a multiple, of a full RAID stripe, the dedicated parity drive sees the same amount of I/O it would in a RAID5 system. As log-structured data layouts such as NetApp’s WAFL, and HPE/Nimble’s CASL which only write in full stripes, have become more popular, RAID4 has also seen a resurgence.

RAID4 performs one I/O per read, n I/Os (1 per drive) for a full stripe write and n+2 I/Os for a write smaller than a strip to accommodate a read, modify, write cycle.

A RAID 4 set can theoretically deliver n-1 times the IOPS of a single drive for small reads, with a similar increase in throughput for larger I/O sizes. In the worst case, small write patterns can reduce the system to the performance of the parity drive.

RAID5 – Shake Your Booty Rotate The Parity

Like RAID4; RAID5 uses XOR to calculate a parity strip from n-1(n=total number of drives in set) data strips.

The difference is that rather than storing all the parity strips on a single drive and maximizing I/O for that drive, a RAID5 system rotates which drive gets the parity strip with each stripe, so each drive holds the parity data for every nth stripe.

RAID5 – Rotating Parity  Hashed Strips are parity

As you would expect, RAID5 has many of the same advantages and disadvantages as RAID4 since both use a single parity strip per stripe to protect data.

Because RAID 5 systems rotate the parity strips, RAID 5 systems do a better job of equalizing I/Os across all the drives of a set, eliminating the bottleneck at the dedicated parity drive.

RAID 5 has, and not without good reason, gotten a bad reputation for write I/O performance with organizations like BAARF (Battle Against Any Raid Five) and DBAs everywhere demanding that their OLTP or other application that creates a large number of small writes be hosted on mirrored disks.

As in RAID4, as long as writes to the RAID4 set are the size, or a multiple, of a full RAID stripe, a RAID5 set can perform n-1 times the number of IOPS any of its constituent drives could perform.

The problem with parity is that there is substantial I/O amplification when the data an application writes is smaller than a full stripe.

The worst case is when the data being written is smaller than the single data strip written to each device in the RAIDset.

Let’s look at the case that lead to the creation of BAARF; running database servers that make large numbers of random 4KB and 8KB writes, to a RAID5 set writing data in 64KB strips.

In order to write that 8KB, the RAID controller, or software (but from now on I’m just going to write controller and you SDS folks should please understand I’m including you too, just not cluttering up my writing being politically correct) has to perform several tasks:

  1. Read a 64KB strip from n-1 drives to read all the data in the strip
  2. Insert the new 8KB of data into the memory buffer holding the data from 1 and recalculate the parity for the stripe
  3. Write the new data and parity strips

Add in that the poor members of BAARF were probably running SQL Server on Windows Server 2003, which created its logical drives offset by one 512 byte sector. This basically makes the RAID system run through the whole read-modify-write process for two stripes, doubling yet again the amount of I/O the poor disks have to do. No wonder these guys hate RAID5.

The exact amount of I/O amplification depends on the number of drives in the set and the size of the I/O (a 128KB write would eliminate the need to read two strips) but the read of existing data must be completed before the data can be written.

Small write latency is the read latency of the slowest drive plus the latency of the slowest of the drives being written to.

This being the storage business once RAID5 became SOP (Standard Operating Procedure) someone had to announce an even higher level of protection and double-parity, now known as RAID-6 was born, but that’s a story for another blog post.