Rebuilding Broken Software Raid 1 in Ubuntu

After having had a hard drive fail on one of our servers here at Bite Of Tech, I searched Google for the best method for replacing the failed drive and re-assembling the raid without having to reboot the server. I found plenty of ways not to do this so I figured I would share the correct way to re-assemble a software raid 1 in Ubuntu Linux.

List your partitions on your Linux box:

df -h

List the status of your raid on your Linux box:

cat /proc/mdstat

Results:

md1 : active raid1 sdc1[1] sda1[0]
      4194240 blocks [2/2] [_U]
md3 : active raid1 sdc3[0] sda3[1]
      970470016 blocks [2/2] [_U]

The “_” you see after the blocks list shows which of the drives is not “Up“. On this example the sdc1 partition and the sdc3 partition are down.

First we need to stop the “smartd” service so it does not prevent you from removing the failed disk from the RAID array.

service smartd stop
ps -C smartd

First we will list the drive as faulty for that particular partition.

mdadm -f /dev/md1 /dev/sdc

Now we will remove that drive from that raid block.

mdadm -r /dev/md1 /dev/sdc

Repeat these steps for the additional raidsets in your array. After these steps you can remove the faulty drive from your server and replace it with the new drive. This drive needs to be of equal or greater size than the current working drive used in your raid array.

Lets make sure that the server has detected your new hard drive.

echo “scsi add-single-device” 0 0 0 0 > /proc/scsi/scsi

Once this is completed you may verify the drive is listed with this command:

cat /proc/scsi/scsi

Low lets mirror the current partition table to the new drive that was added to the system.

sfdisk -d /dev/sdb | sfdisk /dev/sda

You can see if the drive begins to automatically repair the broken mirror by typing:

cat /proc/mdstat

If the drive is not in the array you can add the partitions manually for each raidset.

mdadm –add /dev/md1 /dev/sdc1
mdadm –add /dev/md3 /dev/sdc3

You should now be able to see the drives rebuilding with the cat mdstat command used above.

Results:

md3 : active raid1 sdc3[2] sda3[1]
      970470016 blocks [2/1] [_U]
[>....................] recovery = 1.8%
(17961600/970470016) finish=1700.4min speed=9335K/sec
md1 : active raid1 sdc1[1] sda1[0]
      4194240 blocks [2/2] [UU]

Now you will want to restart the smartd service:

service smartd start

Bite Of Tech Get your fill of technology news and information.

Rebuilding Broken Software Raid 1 in Ubuntu

About Brian Aldridge

Leave a Reply Cancel reply

Enhancing Your AzerothCore Server with the Playerbots Module

Writing Modules for AzerothCore: A Beginner’s Guide

Exploring AzerothCore: Your Gateway to World of Warcraft Private Servers

HOW-TO Tutorial Series

ATLAS Linux Server Tools

The Power Of Steam

Is your Linux box protected from “ShellShock”

Goodbye Old Browsers

Enhancing Your AzerothCore Server with the Playerbots Module

LibreOffice