After having had a hard drive fail on one of our servers here at Bite Of Tech, I searched Google for the best method for replacing the failed drive and re-assembling the raid without having to reboot the server. I found plenty of ways not to do this so I figured I would share the correct way to re-assemble a software raid 1 in Ubuntu Linux.
List your partitions on your Linux box:
List the status of your raid on your Linux box:
Results:
md1 : active raid1 sdc1[1] sda1[0] 4194240 blocks [2/2] [_U] md3 : active raid1 sdc3[0] sda3[1] 970470016 blocks [2/2] [_U]
The “_” you see after the blocks list shows which of the drives is not “Up“. On this example the sdc1 partition and the sdc3 partition are down.
First we need to stop the “smartd” service so it does not prevent you from removing the failed disk from the RAID array.
ps -C smartd
First we will list the drive as faulty for that particular partition.
Now we will remove that drive from that raid block.
Repeat these steps for the additional raidsets in your array. After these steps you can remove the faulty drive from your server and replace it with the new drive. This drive needs to be of equal or greater size than the current working drive used in your raid array.
Lets make sure that the server has detected your new hard drive.
Once this is completed you may verify the drive is listed with this command:
Low lets mirror the current partition table to the new drive that was added to the system.
You can see if the drive begins to automatically repair the broken mirror by typing:
If the drive is not in the array you can add the partitions manually for each raidset.
mdadm –add /dev/md3 /dev/sdc3
You should now be able to see the drives rebuilding with the cat mdstat command used above.
Results:
md3 : active raid1 sdc3[2] sda3[1] 970470016 blocks [2/1] [_U] [>....................] recovery = 1.8% (17961600/970470016) finish=1700.4min speed=9335K/sec md1 : active raid1 sdc1[1] sda1[0] 4194240 blocks [2/2] [UU]
Now you will want to restart the smartd service: