Failed Linux MD RAID devices. That’s what I got to deal with yesterday.
The ext3 file system produced scary errors and remounted the file system
as read-only. A quick look at
# cat /proc/mdstat Personalities : [raid1] [raid5] md1 : active raid5 sdi1 sdh1 sdg1 sdf1 sde1 sdd1 sdc1 sda1 645136128 blocks level 5, 256k chunk, algorithm 2 [10/8] [U_UUUUUUU_]
That’s bad. The second hard drive had failed in a RAID 5 array with no spares. Our mission, get the data back as best we can.
This was a fileserver that was in use. I pulled emergency maintenance
and rebooted the server into single user mode for safety. After the
/dev/md1 was listed as inactive. There were not enough working
devices to bring up the array. Obviously, it wasn’t mounted either.
This was exactly where I wanted to be. (You could also unmount the
broken file system and use
mdadm --stop /dev/md1 to stop the array.)
Next, use mdadm to force assemble the array.
# mdadm --assemble --force /dev/md1 mdadm: forcing event count in /dev/sdb1 from X to Y
Now that should bring the array online. The event counter for
/dev/sdb1 was the least out of date and mdadm just fudged it. This
means we have introduced corruption. Once the second device fails, its
not long before the array fails. So, provided that you bring back the
second failed disk (not the first which failed 3 years ago, right?) you
should introduce minimal corruption. However, you now have a working
Next, we back up the array. Mount
/dev/md1 as a read-only file
system. Use rsync or another tool to copy off the data. Don’t try to
add more disks and rebuild the array before you back it up.
Our mission is accomplished. We have data.