Replacing Your Boot Hard Disk


Boot Drive Failures

Drive failures happen all the time. There are a lot of ways for a drive to fail, and depending on how close it is to being dead you may or may not be able to recover. I have actually dealt with several disk failures recently, but all of those things happened with external firewire or usb drives. This was the first time I was in the situation a) on a linux server, and b) on an internal disk that was also the boot drive.

Step 1 - Get Replacement Hardware

The first question that popped into my mind was whether or not I would be able to find the same size drive as the one that was failing. I was not, and it turned out to not be important to have the same size disk at all.

You can't go smaller, however. You have to have a hard drive that is at least as big as the hard drive that you are replacing. For simplicity sake, you should also get a drive of the same architecture - if you have an SATA drive failing, replace it with another SATA drive. If it's SCSI, but a SCSI drive. It it's PATA, you get the point. The architecture is important because you don't want to confuse the operating system and you don't want to be adding or testing out new hardware during the recovery process.

Step 2 - Make a Bootable CD

Hopefully you can get this step done from a fully working computer. You don't want to try downloading and burning a CD from the system whose damaged drive you are replacing.

There's an open source project called System Rescue CD. It's a bootable linux cd that supports a variety of disk formats. My drives were formatted with exclusively XFS partitions and linux swap partitions. The System Rescue CD worked without modification for me. There is a lot of information on the site, and if you google for variations of System Rescue CD you will find plenty of other people with advice and experience similar to what I'm sharing with you right now.

You want to have some familiarity with linux moving forward. If you don't have any AND you are technical-minded, you may be able to fudge your way through things, but I would strongly recommend recruiting a friend with linux experience to help you out.

Step 3 - Install your new disk alongside your existing disk

With my machine, I had SATA hard disks. Unfortunately for me, the controller only had slots for two drives and I already had two disks in the system. I unplugged the secondary drive in order to get the new drive installed. This means I moved the power plug and data cables from the secondary disk in the system to the new disk I just picked up at the store.

With PATA drives, or IDE drives, you will have to concern yourself with Master and Slave settings on the disks. With SCSI there are other issues. I'm not going to get into the nitty gritty of installing hard disks here. I don't install disks with any regularity anymore and I don't want to give partial or outdated advice.

Going back to my experience, I was able to see that the motherboard had labels on the data cable connectors. One was labeled SATA 0, and the other SATA 1. I assumed that SATA 0 was my primary and bootable disk. That assumption was correct.

If you've never opened up a computer case to put in a hard disk, I would again strongly recommend recruiting a friend familiar with the task.

Step 4 - Boot up your new CD

Simply pop your System Rescue CD into the CD-ROM drive and turn on the system. You may have to go into your BIOS to enable booting from CD, but most machines are configured to boot from CD when a bootable CD is in the drive.

Some text will pop on the screen and you have to press enter to boot. Go ahead, press enter.

Before too long, you will be at a command prompt. During the boot process, the OS probed your system and identified, but did not mount the hard disks on your system. In linux, your hard disks and partitions are identified in the /dev directory. If you have ide (PATA) devices, they are hd0, hd1, or something very similar. If you have SATA hard disks, like I did, they are sda, sdb, etc. In my case, I just had sda and sdb. Additionally, the partitions on the disk are labeled numerically according to their device - so the first partition would be sda1, the second partition sda2, etc.

If you can, mount the partitions BEFORE you try to copy them. This way, if you are using a journaling file system, any pending operations have the opportunity to happen. After you mount the partitions, unmount them.

The mount and unmount commands are simple. Just create a directory and then mount the drive to that directory.

mkdir /olddisk_partition1
mount /dev/sda1 /olddisk_partition1

If it mounted, you can take a peek at the files and potentially try backing a small set of important files to a USB drive. Remember, we haven't created any partitions on the new drive yet, so you can't write to that disk. If it didn't mount, don't stress out. Try mounting other partitions, read the mount man page to see if there are other options required for your specific filesystem, or google around for other people in your same situation. You won't be able to mount a swap partition and you will have to use other software to try mounting NTFS partitions.

To unmount the drive, simply type umount devicename, so for my previous example:

umount /dev/sda1

Step 5 - Copying your Data

Speaking from my own experience, I had damaged disks. You may be in the same boat. It seems that most data recovery tools out there help you out with damaged file systems, but not necessarily with damaged disks. I happened to find a gem that doesn't care if the disk is damaged, and as a bonus it will make several attempts at recovering data from damaged areas.

It also works with good disks too, and it performs pretty quickly. For my drives it was working at around 30 Megabytes of data per second - a far cry from the SATA rating, but most disks don't even come close to the 150 MB/s or 300 MB/s SATA/SATA II spec unless you start working with high end disks and a high end RAID controller.

The software I'm talking about is free, open source, and it's already installed on the System Rescue CD. It's called ddrescue. There is another piece of software out there called dd-rescue designed to do the same exact thing. ddrescue is reportedly better according to the articles that I read, and since it's included on the CD, I had no reason to try them both.

Th tar, dd, rsync, cpio, and cp utilities may work if you have a damage free disk, but the problem is that they neither report progress, nor do they deal with read errors well. Maybe there are special command line switches or new versions out that do, but I'm not aware of them if that is the case.

ddrescue reports progress, and it can stop and resume sessions if you are logging. What it does is it copies all data from one disk to another. Actually, the terminology most commonly used is one "block device" to another, but disks are block devices and that's what we're dealing with. It does not copy files per se. It copies everything from one disk to another, bit by bit - although unlike dd, it reasonably identifies empty data areas and copies them as empty areas rather than bit by bit copying. Semantics for us, but this semantic is a big reason why ddrescue is a faster dd than dd.

If you are only interested in moving partitions, you have to have like partitions already created on the destination disk. We're going to copy the entire disk though, partitions, data, master boot record, everything.

If you have a usb drive, install it so you can save the log file to pause/resume your sessions or to come back later and try once again to grab that data that wouldn't come over.

ddrescue -r 2 /dev/sda /dev/sdb logfile

Don't trust me. Read the syntax. Read the man page. If you copy the empty disk to your old disk, ALL of your data is GONE. The -r flag tells ddrescue how many times to retry grabbing information before giving up. If you do have a device to log to, you may want to break it up into to operations - use the '-n' flag first to grab all the data you can as quickly as possible, then a consecutive try with the '-r' flag to retry the damaged areas. If you don't specify a logfile, consecutive runs will overwrite your previous run through - and like me you will figure out that the last hour and a half of your life was completely wasted. Also, if you have a very damaged disk, you may want to consider turning off logging - either by killing syslog or some other method. The kernel error reporting seems to slow things down a good deal.

Step 6 - Lather, Rinse, Repeat

If you are doing more than one disk, do the same thing with the new disk. Personally, I rebooted the machine and verified that I had all the right partitions on the first copy and mounted the partitions on my new drive to make sure it all worked. Then I shut the system down, unplugged the two drives I was working on, plugged in the next pair of drives to work with, and then repeated the process.

Step 7 - Repair the Copy

Now that you have all of your old data copied to your new drives, shut down your system, plug in just your new drives, and again boot into the System Rescue CD. Now that we have undamaged hard drives to work with, it's time to do any filesystem checking. I had xfs volumes, so for me, I just needed to run xfs_repair. If you have other linux file systems, fsck is probably the way to go. Essentially, you just want to find something similar to the windows program chkdisk for your file system.

For xfs paritions, it was simply a matter of running

xfs_repair /dev/sda2

I imagine fsck is similarly easy to run. Google around and find what you are supposed to be using and the exact syntax. It would be a surprise if the tool you needed wasn't already included on the System Rescue CD.

Step 8 - GParted

This time, we're going to use the GUI to either extend the existing partitions or create new ones to take advantage of the fact that your new drives are bigger than your old ones. If you have the same size drives, skip this step.

From the command line, run the following to get the GUI going:

startx

Once you are in the graphical environment, right click and select GParted. It's a partition editor, and for each drive on your system it will list all of the partitions. XFS allows you to resize partitions, other file systems may vary. It certainly is possible that you will have to add a new partition to take advantage of all your new space. GParted is fairly intuitive and it works well provided you don't have any partitions greater than 1 TB.

Step 9 - Happy Dance

After your partitions have been edited, it's time to shut down the system and boot it up again without the system rescue cd. If you replaced more than one drive, ensure that your new drives are plugged in to the right connectors. (drive 0 is connected as drive 0, drive 1 is connected as drive 1). Turn your system on and when the OS starts up, do a happy dance.

Replacing Your Boot Hard Disk Feedback