Using ddrescue to recover a RAID disk with many bad blocks

Often when one drive in an array fails, other drives in the array may also develop problems since all drives in the array are subject to the exact same wear and environment. In some cases a critical drive may have excessive bad blocks, causing a rebuild to appear hang. In this case, one possible solution is to use Linux ddrescue.

The basic process is to copy all readable blocks to another disk and using the copied disk to be rebuild the array. This is not guaranteed to recover all data as some blocks will not be readable, but there is a potential that the bad blocks were not yet used for file storage.

Since the DDF pointer is in the very last block of the disk, it's critical that disk copied to be the exact same size and geometry as the bad block disk, which means you must use the same drive make and model.

Now some Linux distributions include ddrescue, but it is suggested to download and use SystemRescueCD. SystemRescueCd is a Linux system rescue disk available as a bootable CD-ROM or USB stick for administrating or repairing your system and includes all the needed tools.

The SystemRescueCD can be downloaded from:

    http://www.sysresccd.org/MainPage

As of 10/27/2009 the iso is version 1.3.1 and is 238 MB in size.

The procedure is to record the drives serial numbers (you need to know which drive is the source and which the copy), connect both drives to the Linux host then boot from SystemRescueCD. Getting the drive order correct is critical. Reversing the order will zero out your critical source disk and the array will be lost.

Once booted "fdisk -l" should show the two drives, perhaps /dev/sda and /dev/sdb.

In order to determine which is the source disk, you can use smartmontools.

    smartctl -a /dev/sda | more
    smartctl -a /dev/sdb | more


Another way to determine drive serial numbers is with hdparm.

    hdparm –I /dev/sda
    hdparm –I /dev/sdb


Another useful way to check is to use 'fdisk -l'. The new disk will be unpartitioned, the source disk will probably have partitions and fdisk will show these.

Note the serial numbers, they will tell you which drive is which. Suppose /dev/sda is the source disk for this example.

Next use ddrescue to copy the disk. The basic procedure is...

    ddrescue /dev/sda /dev/sdb

Ddrescue is designed for recovery and the defaults can be used. However it is possible for ddrescue to do many read retries in hopes of recovering more data. This can make recovery considerably longer. Ddrescue has taken as long as 4 days to run on a failing 1TB disk.

More detailed recovery procedures are available from the web.

----------------------------------

Another method is using a variant named ddrescue.

dd
rescue is simpler to run.

Home Page

    http://www.garloff.de/kurt/linux/ddrescue/

Usage…

    http://www.forensicswiki.org/wiki/Ddrescue

From the webpage above…

    Sample usage

    Here is a common ddrescue command:

    UNIX/Linux

    $ dd
rescue /dev/hda myfile.img

Dd_rescue has few options and recovers to an image file. This can cause problems recovering disks 1TB or larger. Not only will you need 1TB or more free disk space to recover the image, you'll then have to write the image file to another disk. If you have the disk space available, writing to a destination file is safer than a disk copy with ddrescue as there is no possibility of overwriting the source disk with the contents of a blank disk.