Blog

Hands-On Testing and Analysis

All About Data Protection Part 2¾ – A Few Words On Parity

Parity

Unfortunately, parity is one of those words that means different things depending on the context. To make things worse we IT folks talk about double-parity, a concept that would make our favorite mathematician, Rachel Traylor Ph.D., blow her top.

Strictly speaking, parity is the special case of a forward error correction code, which adds one error check bit to some set of data bits. This makes double parity a bit of a misnomer, though it is too useful a term to discard.

My first experience with parity came with the RS-232 serial ports that connected 1970s microcomputers to terminals, printers and just about anything else they could talk to. A data word on an RS-232 link had a start bit, 5-8 data bits, an optional parity bit and a stop bit.

When setting up the connection to any device both ends had to agree to odd or even parity. If they selected odd parity The parity bit would then be set to 1 if the sum of all the data bits was odd and cleared to 0 if the total was even. Even parity would, of course, be exactly the reverse.

For RAID, and all of its more distributed descendants, parity isn’t based on the sum of the data bits but on the Boolean exclusive-or (XOR, ) operation which returns a 1 if the two bits it’s comparing are different (either 0 and 1 or 1 and 0) and a 0 if the two bits are the same (1&1 or 0&0). The excessively mathematical among you will recognize XOR as modulo 2 addition and when serially XORing multiple values the same as the old serial port’s odd parity being a 1 when there are an odd number of 1s in the string and a 0 when the number of 1s is even.

For example:

      11010010

   01101011

      10111001

For a 4D+P data protection set the parity strip in each stripe P is calculated as:

P=ABCD

where A, B, C, and D are the four data strips.

The value of any data strip can be calculated by XORing the remaining data strips with the parity strip. That is:

A=PBCD

B=APCD

C=ABPD

   and

D=ABCP

Back in the 1990’s server CPUs were pretty anemic delivering at best 1/1000th the horsepower of today’s multicore 64bit Xeons and didn’t implement XOR as a native instruction. Intel’s i960 RISC processor did have an XOR primitive and was for that reason the processor of choice for RAID controller vendors like Mylex and AMI.

The ability to replace missing data makes XOR parity an erasure code when implemented in an HCI solution across multiple nodes or old-fashioned RAID.