Blog

Hands-On Testing and Analysis

All About Data Protection – Part One Where Did RAID Come From?

RAID

Just about all modern data protection schemes divide data into some arbitrarily sized block. Then, they either duplicate that block across multiple storage devices, or store portions of that data block, which we’ll call strips, in stripes across multiple storage devices, with one or more additional strips in each stripe containing parity or other erasure code data.

Those are pretty simple concepts. And yet, there’s a lot of confused and confusing information on the Internet about how RAID and related technologies such as distributed erasure-coding worked.

So, being the storage industry’s old man yelling at clouds, I decided it was time to go back to the basics and explain the technologies that let us sleep at night.

I start at the very beginning (a very good place to start, according to Oscar Hammerstein II), of my career in the early 1980s.

While I’m sure that the mainframes and VAXes of the day had some form of disk mirroring, the S-100 bus microcomputers and PCs of my early career were lucky to have hard drives at all.

We were religious about making backups because the hard drives of the day had annual failure rates (AFR) of 20-60% (11,000-30,000 hour MTBFs). I remember writing an ISAM system in GWBASIC that wrote 2 copies of the data file on two separate disks.

The first big advance in data resiliency for small computers was the inclusion of disk mirroring in Novell’s NetWare, which Novell dubbed System Fault Tolerant Level-II (SFT-II). SFT I was defect re-mapping; a function disk drives have been handling transparently since IDE and SCSI replaced ST-506 and ESDI almost 30 years ago.

Back in the day the disk controllers for ST-506, and first-generation SCSI, disk drives were so stupid that the controller could only process one command at a time. Novell’s NetWare drew a distinction between drives on the same controller, which had to be served from a single queue, and drives on separate controllers, which could independently process a command.

NetWare would distribute disk I/Os across the two members of a mirrored pair, which they, and later Microsoft, would call disk duplexing to avoid long seeks.

Because NetWare used the file server’s RAM as a write-back cache, this would include both reads and asynchronous writes from the cache. To avoid, to some extent, data loss in the cache, NetWare would monitor the power fail alert from its UPS and flush when mains power failed.

About the same time I was installing mirrored disks in NetWare (1988), David Patterson, Garth Gibson and Randy Katz at Berkeley published A Case for Redundant Arrays of Inexpensive Disks, a paper which proposed using arrays of 5 ¼” and/or 3 ½” inexpensive disk drives to address the performance bottlenecks created by the performance hard drives of the day, such as the IBM 3380 they used as an example. While Patterson Et. Al. didn’t invent any of the techniques they presented in their paper the RAID levels they proposed have become the standard taxonomy for discussing RAID and related technologies.

The 3380 had four head positioners to deliver 120-200 IOPS from 7.5GB of capacity. The Conner Peripherals’ CP3100 3 ½” drive of the day only delivered 20 IOPS, but at 100MB each you’d need 100 or so of them to give you 7.5GB of useable capacity. This would have 2000 aggregate IOPS and still cost less than half as much as the 3380.

Of course, that Single Large Expensive Drive (SLED) was a lot more reliable than 100 of the little Conners, so RAID would address the problem by storing redundant data to become fault tolerant. As the capacity and performance of small disk drives increased, the expensive 8” and 14” disk drives disappeared, a process Clayton Christensen used as an example of disruptive innovation in his seminal The Innovator’s Dilemma.

By the early 90’s, some high-performance 5 ¼” disks were getting to cost a pretty penny. Some wiseacre (it could have been me but I think it was Mickey Applebaum) made a joke about RAID and those drives not being inexpensive.

A few months later the manufacturers formed a RAID Industry Association and promptly changed the I in RAID from “inexpensive” to “independent.”

Well, that’s enough ancient history. Next time we’ll look at standard, and common if not quite standard, RAID schemes.