Chapter 10. Managing Software RAIDs 6 and 10 with mdadm

Contents

10.1. Creating a RAID 6
10.2. Creating Nested RAID 10 Devices with mdadm
10.3. Creating a Complex RAID 10 with mdadm
10.4. Creating a Degraded RAID Array

This section describes how to create software RAID 6 and 10 devices, using the Multiple Devices Administration (mdadm(8)) tool. You can also use mdadm to create RAIDs 0, 1, 4, and 5. The mdadm tool provides the functionality of legacy programs mdtools and raidtools.

10.1. Creating a RAID 6

10.1.1. Understanding RAID 6

RAID 6 is essentially an extension of RAID 5 that allows for additional fault tolerance by using a second independent distributed parity scheme (dual parity). Even if two of the hard disk drives fail during the data recovery process, the system continues to be operational, with no data loss.

RAID 6 provides for extremely high data fault tolerance by sustaining multiple simultaneous drive failures. It handles the loss of any two devices without data loss. Accordingly, it requires N+2 drives to store N drives worth of data. It requires a minimum of 4 devices.

The performance for RAID 6 is slightly lower but comparable to RAID 5 in normal mode and single disk failure mode. It is very slow in dual disk failure mode.

Table 10.1. Comparison of RAID 5 and RAID 6

Feature

RAID 5

RAID 6

Number of devices

N+1, minimum of 3

N+2, minimum of 4

Parity

Distributed, single

Distributed, dual

Performance

Medium impact on write and rebuild

More impact on sequential write than RAID 5

Fault-tolerance

Failure of one component device

Failure of two component devices


10.1.2. Creating a RAID 6

The procedure in this section creates a RAID 6 device /dev/md0 with four devices: /dev/sda1, /dev/sdb1, /dev/sdc1, and /dev/sdd1. Ensure that you modify the procedure to use your actual device nodes.

  1. Open a terminal console, then log in as the root user or equivalent.

  2. Create a RAID 6 device. At the command prompt, enter

    mdadm --create /dev/md0 --run --level=raid6 --chunk=128 --raid-devices=4 /dev/sdb1 /dev/sdc1 /dev/sdc1 /dev/sdd1
    

    The default chunk size is 64 KB.

  3. Create a file system on the RAID 6 device /dev/md0, such as a Reiser file system (reiserfs). For example, at the command prompt, enter

    mkfs.reiserfs /dev/md0
    

    Modify the command if you want to use a different file system.

  4. Edit the /etc/mdadm.conf file to add entries for the component devices and the RAID device /dev/md0.

  5. Edit the /etc/fstab file to add an entry for the RAID 6 device /dev/md0.

  6. Reboot the server.

    The RAID 6 device is mounted to /local.

  7. (Optional) Add a hot spare to service the RAID array. For example, at the command prompt enter:

    mdadm /dev/md0 -a /dev/sde1
    

10.2. Creating Nested RAID 10 Devices with mdadm

10.2.1. Understanding Nested RAID Devices

A nested RAID device consists of a RAID array that uses another RAID array as its basic element, instead of using physical disks. The goal of this configuration is to improve the performance and fault tolerance of the RAID.

Linux supports nesting of RAID 1 (mirroring) and RAID 0 (striping) arrays. Generally, this combination is referred to as RAID 10. To distinguish the order of the nesting, this document uses the following terminology:

  • RAID 1+0: RAID 1 (mirror) arrays are built first, then combined to form a RAID 0 (stripe) array.

  • RAID 0+1: RAID 0 (stripe) arrays are built first, then combined to form a RAID 1 (mirror) array.

The following table describes the advantages and disadvantages of RAID 10 nesting as 1+0 versus 0+1. It assumes that the storage objects you use reside on different disks, each with a dedicated I/O capability.

Table 10.2. Nested RAID Levels

RAID Level

Description

Performance and Fault Tolerance

10 (1+0)

RAID 0 (stripe) built with RAID 1 (mirror) arrays

RAID 1+0 provides high levels of I/O performance, data redundancy, and disk fault tolerance. Because each member device in the RAID 0 is mirrored individually, multiple disk failures can be tolerated and data remains available as long as the disks that fail are in different mirrors.

You can optionally configure a spare for each underlying mirrored array, or configure a spare to serve a spare group that serves all mirrors.

10 (0+1)

RAID 1 (mirror) built with RAID 0 (stripe) arrays

RAID 0+1 provides high levels of I/O performance and data redundancy, but slightly less fault tolerance than a 1+0. If multiple disks fail on one side of the mirror, then the other mirror is available. However, if disks are lost concurrently on both sides of the mirror, all data is lost.

This solution offers less disk fault tolerance than a 1+0 solution, but if you need to perform maintenance or maintain the mirror on a different site, you can take an entire side of the mirror offline and still have a fully functional storage device. Also, if you lose the connection between the two sites, either site operates independently of the other. That is not true if you stripe the mirrored segments, because the mirrors are managed at a lower level.

If a device fails, the mirror on that side fails because RAID 1 is not fault-tolerant. Create a new RAID 0 to replace the failed side, then resynchronize the mirrors.


10.2.2. Creating Nested RAID 10 (1+0) with mdadm

A nested RAID 1+0 is built by creating two or more RAID 1 (mirror) devices, then using them as component devices in a RAID 0.

[Important]

If you need to manage multiple connections to the devices, you must configure multipath I/O before configuring the RAID devices. For information, see Chapter 7, Managing Multipath I/O for Devices.

The procedure in this section uses the device names shown in the following table. Ensure that you modify the device names with the names of your own devices.

Table 10.3. Scenario for Creating a RAID 10 (1+0) by Nesting

Raw Devices

RAID 1 (mirror)

RAID 1+0 (striped mirrors)

/dev/sdb1
/dev/sdc1

/dev/md0

/dev/md2

/dev/sdd1
/dev/sde1

/dev/md1


  1. Open a terminal console, then log in as the root user or equivalent.

  2. Create 2 software RAID 1 devices, using two different devices for each RAID 1 device. At the command prompt, enter these two commands:

    mdadm --create /dev/md0 --run --level=1 --raid-devices=2 /dev/sdb1 /dev/sdc1
    
    mdadm --create /dev/md1 --run --level=1 --raid-devices=2 /dev/sdd1 /dev/sde1
    
  3. Create the nested RAID 1+0 device. At the command prompt, enter the following command using the software RAID 1 devices you created in Step 2:

    mdadm --create /dev/md2 --run --level=0 --chunk=64 --raid-devices=2 /dev/md0 /dev/md1
    

    The default chunk size is 64 KB.

  4. Create a file system on the RAID 1+0 device /dev/md2, such as a Reiser file system (reiserfs). For example, at the command prompt, enter

    mkfs.reiserfs /dev/md2
    

    Modify the command if you want to use a different file system.

  5. Edit the /etc/mdadm.conf file to add entries for the component devices and the RAID device /dev/md2.

  6. Edit the /etc/fstab file to add an entry for the RAID 1+0 device /dev/md2.

  7. Reboot the server.

    The RAID 1+0 device is mounted to /local.

10.2.3. Creating Nested RAID 10 (0+1) with mdadm

A nested RAID 0+1 is built by creating two to four RAID 0 (striping) devices, then mirroring them as component devices in a RAID 1.

[Important]

If you need to manage multiple connections to the devices, you must configure multipath I/O before configuring the RAID devices. For information, see Chapter 7, Managing Multipath I/O for Devices.

In this configuration, spare devices cannot be specified for the underlying RAID 0 devices because RAID 0 cannot tolerate a device loss. If a device fails on one side of the mirror, you must create a replacement RAID 0 device, than add it into the mirror.

The procedure in this section uses the device names shown in the following table. Ensure that you modify the device names with the names of your own devices.

Table 10.4. Scenario for Creating a RAID 10 (0+1) by Nesting

Raw Devices

RAID 0 (stripe)

RAID 0+1 (mirrored stripes)

/dev/sdb1
/dev/sdc1

/dev/md0

/dev/md2

/dev/sdd1
/dev/sde1

/dev/md1


  1. Open a terminal console, then log in as the root user or equivalent.

  2. Create two software RAID 0 devices, using two different devices for each RAID 0 device. At the command prompt, enter these two commands:

    mdadm --create /dev/md0 --run --level=0 --chunk=64 --raid-devices=2 /dev/sdb1 /dev/sdc1
    
    mdadm --create /dev/md1 --run --level=0 --chunk=64 --raid-devices=2 /dev/sdd1 /dev/sde1
    

    The default chunk size is 64 KB.

  3. Create the nested RAID 0+1 device. At the command prompt, enter the following command using the software RAID 0 devices you created in Step 2:

    mdadm --create /dev/md2 --run --level=1 --raid-devices=2 /dev/md0 /dev/md1
    
  4. Create a file system on the RAID 0+1 device /dev/md2, such as a Reiser file system (reiserfs). For example, at the command prompt, enter

    mkfs.reiserfs /dev/md2
    

    Modify the command if you want to use a different file system.

  5. Edit the /etc/mdadm.conf file to add entries for the component devices and the RAID device /dev/md2.

  6. Edit the /etc/fstab file to add an entry for the RAID 0+1 device /dev/md2.

  7. Reboot the server.

    The RAID 0+1 device is mounted to /local.

10.3. Creating a Complex RAID 10 with mdadm

10.3.1. Understanding the mdadm RAID10

In mdadm, the RAID10 level creates a single complex software RAID that combines features of both RAID 0 (striping) and RAID 1 (mirroring). Multiple copies of all data blocks are arranged on multiple drives following a striping discipline. Component devices should be the same size.

10.3.1.1. Comparing the Complex RAID10 and Nested RAID 10 (1+0)

The complex RAID 10 is similar in purpose to a nested RAID 10 (1+0), but differs in the following ways:

Table 10.5. Complex vs. Nested RAID 10

Feature

mdadm RAID10 Option

Nested RAID 10 (1+0)

Number of devices

Allows an even or odd number of component devices

Requires an even number of component devices

Component devices

Managed as a single RAID device

Manage as a nested RAID device

Striping

Striping occurs in the near or far layout on component devices.

The far layout provides sequential read throughput that scales by number of drives, rather than number of RAID 1 pairs.

Striping occurs consecutively across component devices

Multiple copies of data

Two or more copies, up to the number of devices in the array

Copies on each mirrored segment

Hot spare devices

A single spare can service all component devices

Configure a spare for each underlying mirrored array, or configure a spare to serve a spare group that serves all mirrors.


10.3.1.2. Number of Replicas in the mdadm RAID10

When configuring an mdadm RAID10 array, you must specify the number of replicas of each data block that are required. The default number of replicas is 2, but the value can be 2 to the number of devices in the array.

10.3.1.3. Number of Devices in the mdadm RAID10

You must use at least as many component devices as the number of replicas you specify. However, number of component devices in a RAID10 array does not need to be a multiple of the number of replicas of each data block. The effective storage size is the number of devices divided by the number of replicas.

For example, if you specify 2 replicas for an array created with 5 component devices, a copy of each block is stored on two different devices. The effective storage size for one copy of all data is 5/2 or 2.5 times the size of a component device.

10.3.1.4. Near Layout

With the near layout, copies of a block of data are striped near each other on different component devices. That is, multiple copies of one data block are at similar offsets in different devices. Near is the default layout for RAID10. For example, if you use an odd number of component devices and two copies of data, some copies are perhaps one chunk further into the device.

The near layout for the mdadm RAID10 yields read and write performance similar to RAID 0 over half the number of drives.

Near layout with an even number of disks and two replicas:

sda1 sdb1 sdc1 sde1
  0    0    1    1
  2    2    3    3
  4    4    5    5
  6    6    7    7
  8    8    9    9

Near layout with an odd number of disks and two replicas:

sda1 sdb1 sdc1 sde1 sdf1
  0    0    1    1    2
  2    3    3    4    4
  5    5    6    6    7
  7    8    8    9    9
  10   10   11   11   12

10.3.1.5. Far Layout

The far layout stripes data over the early part of all drives, then stripes a second copy of the data over the later part of all drives, making sure that all copies of a block are on different drives. The second set of values starts halfway through the component drives.

With a far layout, the read performance of the mdadm RAID10 is similar to a RAID 0 over the full number of drives, but write performance is substantially slower than a RAID 0 because there is more seeking of the drive heads. It is best used for read-intensive operations such as for read-only file servers.

The speed of the raid10 for writing is similar to other mirrored RAID types, like raid1 and raid10 using near layout, as the elevator of the file system schedules the writes in a more optimal way than raw writing. Using raid10 in the far layout well-suited for mirrored writing applications.

Far layout with an even number of disks and two replicas:

sda1 sdb1 sdc1 sde1
  0    1    2    3
  4    5    6    7       
  . . .
  3    0    1    2
  7    4    5    6

Far layout with an odd number of disks and two replicas:

sda1 sdb1 sdc1 sde1 sdf1
  0    1    2    3    4
  5    6    7    8    9
  . . .
  4    0    1    2    3
  9    5    6    7    8

10.3.1.6. Offset Layout

The offset layout duplicates stripes so that the multiple copies of a given chunk are laid out on consecutive drives and at consecutive offsets. Effectively, each stripe is duplicated and the copies are offset by one device. This should give similar read characteristics to a far layout if a suitably large chunk size is used, but without as much seeking for writes.

Offset layout with an even number of disks and two replicas:

sda1 sdb1 sdc1 sde1
  0    1    2    3
  3    0    1    2       
  4    5    6    7
  7    4    5    6
  8    9   10   11
 11    8    9   10

Offset layout with an odd number of disks and two replicas:

sda1 sdb1 sdc1 sde1 sdf1
  0    1    2    3    4
  4    0    1    2    3
  5    6    7    8    9
  9    5    6    7    8
 10   11   12   13   14
 14   10   11   12   13                

10.3.2. Creating a RAID 10 with mdadm

The RAID10 option for mdadm creates a RAID 10 device without nesting. For information about RAID10-, see Section 10.3, “Creating a Complex RAID 10 with mdadm”.

The procedure in this section uses the device names shown in the following table. Ensure that you modify the device names with the names of your own devices.

Table 10.6. Scenario for Creating a RAID 10 Using the mdadm RAID10 Option

Raw Devices

RAID10 (near or far striping scheme)

/dev/sdf1

/dev/sdg1

/dev/sdh1

/dev/sdi1

/dev/md3


  1. In YaST, create a 0xFD Linux RAID partition on the devices you want to use in the RAID, such as /dev/sdf1, /dev/sdg1, /dev/sdh1, and /dev/sdi1.

  2. Open a terminal console, then log in as the root user or equivalent.

  3. Create a RAID 10 command. At the command prompt, enter (all on the same line):

    mdadm --create /dev/md3 --run --level=10 --chunk=4 --raid-devices=4 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1
    
  4. Create a Reiser file system on the RAID 10 device /dev/md3. At the command prompt, enter

    mkfs.reiserfs /dev/md3
    
  5. Edit the /etc/mdadm.conf file to add entries for the component devices and the RAID device /dev/md3. For example:

    DEVICE /dev/md3
    
  6. Edit the /etc/fstab file to add an entry for the RAID 10 device /dev/md3.

  7. Reboot the server.

    The RAID10 device is mounted to /raid10.

10.4. Creating a Degraded RAID Array

A degraded array is one in which some devices are missing. Degraded arrays are supported only for RAID 1, RAID 4, RAID 5, and RAID 6. These RAID types are designed to withstand some missing devices as part of their fault-tolerance features. Typically, degraded arrays occur when a device fails. It is possible to create a degraded array on purpose.

RAID Type

Allowable Number of Slots Missing

 

RAID 1

All but one device

 

RAID 4

One slot

 

RAID 5

One slot

 

RAID 6

One or two slots

 

To create a degraded array in which some devices are missing, simply give the word missing in place of a device name. This causes mdadm to leave the corresponding slot in the array empty.

When creating a RAID 5 array, mdadm automatically creates a degraded array with an extra spare drive. This is because building the spare into a degraded array is generally faster than resynchronizing the parity on a non-degraded, but not clean, array. You can override this feature with the --force option.

Creating a degraded array might be useful if you want create a RAID, but one of the devices you want to use already has data on it. In that case, you create a degraded array with other devices, copy data from the in-use device to the RAID that is running in degraded mode, add the device into the RAID, then wait while the RAID is rebuilt so that the data is now across all devices. An example of this process is given in the following procedure:

  1. Create a degraded RAID 1 device /dev/md0, using one single drive /dev/sd1, enter the following at the command prompt:

    mdadm --create /dev/md0 -l 1 -n 2 /dev/sda1 missing
    

    The device should be the same size or larger than the device you plan to add to it.

  2. If the device you want to add to the mirror contains data that you want to move to the RAID array, copy it now to the RAID array while it is running in degraded mode.

  3. Add a device to the mirror. For example, to add /dev/sdb1 to the RAID, enter the following at the command prompt:

    mdadm /dev/md0 -a /dev/sdb1
    

    You can add only one device at a time. You must wait for the kernel to build the mirror and bring it fully online before you add another mirror.

  4. Monitor the build progress by entering the following at the command prompt:

    cat /proc/mdstat
    

    To see the rebuild progress while being refreshed every second, enter

    watch -n 1 cat /proc/mdstat