Rebuilding Broken Software Raid 1 in Ubuntu

After having had a hard drive fail on one of our servers here at Bite Of Tech, I searched Google for the best method for replacing the failed drive and re-assembling the raid without having to reboot the server. I found plenty of ways not to do this so I figured I would share the correct way to re-assemble a software raid 1 in Ubuntu Linux. 

List your partitions on your Linux box:

df -h

 List the status of your raid on your Linux box:

cat /proc/mdstat

Results:

md1 : active raid1 sdc1[1] sda1[0]
      4194240 blocks [2/2] [_U]
md3 : active raid1 sdc3[0] sda3[1]
      970470016 blocks [2/2] [_U]

The “_” you see after the blocks list shows which of the drives is not “Up“. On this example the sdc1 partition and the sdc3 partition are down.

First we need to stop the “smartd” service so it does not prevent you from removing the failed disk from the RAID array.

service smartd stop
ps -C smartd

First we will list the drive as faulty for that particular partition.

mdadm -f /dev/md1 /dev/sdc

Now we will remove that drive from that raid block.

mdadm -r /dev/md1 /dev/sdc

Repeat these steps for the additional raidsets in your array. After these steps you can remove the faulty drive from your server and replace it with the new drive. This drive needs to be of equal or greater size than the current working drive used in your raid array.

Lets make sure that the server has detected your new hard drive. 

echo “scsi add-single-device” 0 0 0 0 > /proc/scsi/scsi

Once this is completed you may verify the drive is listed with this command:

cat /proc/scsi/scsi

Low lets mirror the current partition table to the new drive that was added to the system.

 sfdisk -d /dev/sdb | sfdisk /dev/sda

You can see if the drive begins to automatically repair the broken mirror by typing:

cat /proc/mdstat

If the drive is not in the array you can add the partitions manually for each raidset.

 mdadm –add /dev/md1 /dev/sdc1
mdadm –add /dev/md3 /dev/sdc3

You should now be able to see the drives rebuilding with the cat mdstat command used above.

Results:

md3 : active raid1 sdc3[2] sda3[1]
      970470016 blocks [2/1] [_U]
[>....................] recovery = 1.8%
(17961600/970470016) finish=1700.4min speed=9335K/sec
md1 : active raid1 sdc1[1] sda1[0]
      4194240 blocks [2/2] [UU]

Now you will want to restart the smartd service:

service smartd start

About Brian Aldridge

I am a software developer and podcaster. Catch me weekly on Infection - The Survival Podcast at https://infectionpodcast.com

Leave a Reply