mdadm: growing a 3-disk RAID-0 array to a 4-disk RAID-5 one

As lots of other people, I strongly believe that a proper backup policy is a mandatory key-factor in every scenario.

Unfortunately –again, as lots of other people– I also know that a proper backup policy is hard to implement, expecially when involving serious budget constraints.

To be short: for one of my customers, I setup an HP-Microserver Gen-8 as a CentOS 7 box acting:

as an NFS server towards both a XEN-Server 6.2 and ESXi 5.5, to receive proper snapshots of running VMs (actually, thanks to two great pieces of open-source softwares: snapback and ghettoVCB) ;
as a BareOS server, relying on local storage for common/standard client-server backups.

As you might guess, I needed as much local storage as possible so…. I choosed to buy the largest HDD currently available: 8TB. I decided to buy them on Amazon as they were really cheap (0.027€ per GB!) but unfortunately you can buy only three HDD per order. So I said to myself: “Ok! I’m going to buy the first three and setup everything as a 24TB RAID-0 array. Then I’ll buy the 4th and will simply change the RAID-0 array to a RAID-5 one, while keeping everything intact: no reinstallation!“. At least, that was what I tought!

Here is the RAID-0 initially built:

[root@srv-backup bareos]# cat /proc/mdstat 
Personalities : [raid0] 
md127 : active raid0 sda5[0] sdc5[2] sdb5[1]
      23268661248 blocks super 1.2 512k chunks
      
unused devices: <none>
[root@srv-backup bareos]#
[root@srv-backup bareos]#
[root@srv-backup bareos]# mdadm --detail /dev/md127
/dev/md127:
        Version : 1.2
  Creation Time : Tue Dec 1 22:24:59 2015
     Raid Level : raid0
     Array Size : 23268661248 (22190.73 GiB 23827.11 GB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Tue Dec 1 22:24:59 2015
          State : clean 
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 512K

           Name : srv-backup:127 (local to host srv-backup)
           UUID : 0adf1beb:e111ab11:c2f0d75b:9bc1ca9f
         Events : 0

      Number Major Minor RaidDevice State
         0     8      5      0      active sync  /dev/sda5
         1     8     21      1      active sync  /dev/sdb5
         2     8     37      2      active sync  /dev/sdc5

that actually is also well used:

[root@srv-backup bareos]# df -h /REPOSITORY 
Filesystem      Size  Used Avail Use% Mounted on
/dev/md127       22T  9.2T   13T  43% /REPOSITORY

and here are the details of the first three 8TB HDD:

[root@srv-backup bareos]# smartctl -a /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-229.20.1.el7.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST8000AS0002-1NA17Z
Serial Number:    Z8409Q6K
LU WWN Device Id: 5 000c50 08749d0aa
Firmware Version: AR15
User Capacity:    8,001,563,222,016 bytes [8.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sun Dec 27 22:36:40 2015 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

(Note: sdb and sdc are exactly the same with the only exception of the Serial Number)

After a couple of weeks, I ordered and received the fourth HDD so it was time to play some nice exercise. My two-item wondering-list was:

“I’ve a free slot in my MicroServer: even if it is explicitely reported to not include an hot-plug HDD controller, will it be able to recognize the fourth disk, without a reboot?“
“Will I be able to reshape the currently-perfectly-working RAID-0 array to a RAID-5 one, without any reboot/recreate/annoiance?“

The reason underlying item 1) was simply that I’d like NOT to interrupt BareOS services. Even if it was not running any backup (at such a time) I said to myself: “Ok! But what happens if backup jobs are running and you don’t want to stop them?“. So I simply pulled the HDD-guide of the empty fourth bay, properly screwed the new HDD and…. pushed it inside the microserver.

Good news! No kernel-panic; no burning flames; no problem at all! On my remote-console (I had a tail -f system.log running on my SSH remote session) I saw:

Dec 23 12:22:09 srv-backup kernel:[1863844.854189] ata6: exception Emask 0x10 SAct 0x0 SErr 0x40c0000 action 0xe frozen
Dec 23 12:22:09 srv-backup kernel:[1863844.854433] ata6: irq_stat 0x00000040, connection status changed
Dec 23 12:22:09 srv-backup kernel:[1863844.854578] ata6: SError: { CommWake 10B8B DevExch }
Dec 23 12:22:09 srv-backup kernel:[1863844.865458] ata6: hard resetting link
Dec 23 12:22:10 srv-backup kernel:[1863845.308549] ata5: exception Emask 0x10 SAct 0x0 SErr 0x40c0202 action 0xe frozen
Dec 23 12:22:10 srv-backup kernel:[1863845.308735] ata5: irq_stat 0x00000040, connection status changed
Dec 23 12:22:10 srv-backup kernel:[1863845.308879] ata5: SError: { RecovComm Persist CommWake 10B8B DevExch }
Dec 23 12:22:10 srv-backup kernel:[1863845.309036] ata5: hard resetting link
Dec 23 12:22:19 srv-backup kernel:[1863854.865924] ata6: softreset failed (1st FIS failed)
Dec 23 12:22:19 srv-backup kernel:[1863854.866063] ata6: hard resetting link
Dec 23 12:22:20 srv-backup kernel:[1863855.308946] ata5: softreset failed (1st FIS failed)
Dec 23 12:22:20 srv-backup kernel:[1863855.309083] ata5: hard resetting link
Dec 23 12:22:24 srv-backup kernel:[1863859.247140] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 23 12:22:24 srv-backup kernel:[1863859.258953] ata5.00: configured for UDMA/133
Dec 23 12:22:24 srv-backup kernel:[1863859.270091] ata5: EH complete
Dec 23 12:22:29 srv-backup kernel:[1863864.867391] ata6: softreset failed (1st FIS failed)
Dec 23 12:22:29 srv-backup kernel:[1863864.867529] ata6: hard resetting link
Dec 23 12:22:30 srv-backup kernel:[1863865.898457] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 23 12:22:30 srv-backup kernel:[1863865.906925] ata6.00: ATA-9: ST8000AS0002-1NA17Z, AR15, max UDMA/133
Dec 23 12:22:30 srv-backup kernel:[1863865.906938] ata6.00: 15628053168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
Dec 23 12:22:30 srv-backup kernel:[1863865.908351] ata6.00: configured for UDMA/133
Dec 23 12:22:30 srv-backup kernel:[1863865.919412] ata6: EH complete
Dec 23 12:22:30 srv-backup kernel:[1863865.919626] scsi 5:0:0:0: Direct-Access ATA ST8000AS0002-1NA AR15 PQ: 0 ANSI: 5
Dec 23 12:22:30 srv-backup kernel:[1863865.920263] sd 5:0:0:0: [sdd] 15628053168 512-byte logical blocks: (8.00 TB/7.27 TiB)
Dec 23 12:22:30 srv-backup kernel:[1863865.920279] sd 5:0:0:0: [sdd] 4096-byte physical blocks
Dec 23 12:22:30 srv-backup kernel:[1863865.921033] sd 5:0:0:0: [sdd] Write Protect is off
Dec 23 12:22:30 srv-backup kernel:[1863865.921047] sd 5:0:0:0: [sdd] Mode Sense: 00 3a 00 00
Dec 23 12:22:30 srv-backup kernel:[1863865.921298] sd 5:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Dec 23 12:22:31 srv-backup kernel:[1863865.939621] sdd: unknown partition table
Dec 23 12:22:31 srv-backup kernel:[1863865.940106] sd 5:0:0:0: [sdd] Attached SCSI disk

So 1) item was solved: even if it is dangerous, an additional HDD S-ATA disk can be added to a running system even if it has no explicit hot-swap capabilities.

After cloning the partition table (GPT-based) from sda to sdd (with a sgdisk –replicate=/dev/sdd /dev/sda ) and ensure that new partitions were properly recognized by the running system (with a partprobe ), I ended up with this partition layout:

[root@srv-backup 23]# cat /proc/partitions 
major minor  #blocks  name

  11        0    1048575 sr0
   8        0 7814026584 sda
   8        1       2048 sda1
   8        2    1048576 sda2
   8        3   52428800 sda3
   8        4    4194304 sda4
   8        5 7756351815 sda5
   8       32 7814026584 sdc
   8       33       2048 sdc1
   8       34    1048576 sdc2
   8       35   52428800 sdc3
   8       36    4194304 sdc4
   8       37 7756351815 sdc5
   8       16 7814026584 sdb
   8       17       2048 sdb1
   8       18    1048576 sdb2
   8       19   52428800 sdb3
   8       20    4194304 sdb4
   8       21 7756351815 sdb5
   9      127 23268661248 md127
   8       48 7814026584 sdd
   8       49       2048 sdd1
   8       50    1048576 sdd2
   8       51   52428800 sdd3
   8       52    4194304 sdd4
   8       53 7756351815 sdd5

so, sdd5 were ready to be included into the RAID-Array, where such an array needed to be reshaped from the running RAID-0 to the new RAID-5.

After some (actually… not very useful/effective) on-line research, I decided to go with this command:

[root@srv-backup bareos]# mdadm /dev/md127 --grow --add /dev/sdd5 --raid-devices=4 --level=raid5 --backup-file /mnt/temp/md127-backup
mdadm: level of /dev/md127 changed to raid5        
mdadm: added /dev/sdd5

Very good news! It seems it worked!

Obviously I perfectly understand that in order to reshape a 24TB existing RAID-0 array to a RAID-5 format, some time (a LOT of time) is needed… So output below was no surprise, after more than 30 minutes of reshaping progress:

[root@srv-backup bareos]# cat /proc/mdstat                                                                                               
Personalities : [raid0] [raid6] [raid5] [raid4] 
md127 : active raid5 sdd5[4] sda5[0] sdc5[2] sdb5[1]
      23268661248 blocks super 1.2 level 5, 512k chunk, algorithm 5 [4/3] [UUU_]
      [>....................]  reshape =  0.1% (15489024/7756220416) finish=64596.5min speed=1996K/sec
      
unused devices:

Just to further check that everything is properly in progress, I checked the details with:

[root@srv-backup bareos]# mdadm --detail /dev/md127
/dev/md127:
        Version : 1.2
  Creation Time : Tue Dec  1 22:24:59 2015
     Raid Level : raid5
     Array Size : 23268661248 (22190.73 GiB 23827.11 GB)
  Used Dev Size : 7756220416 (7396.91 GiB 7942.37 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Sun Dec 27 23:07:23 2015
          State : clean, degraded, reshaping 
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1

         Layout : parity-last
     Chunk Size : 512K

 Reshape Status : 0% complete
     New Layout : left-symmetric

           Name : srv-backup:127  (local to host srv-backup)
           UUID : 0adf1beb:e111ab11:c2f0d75b:9bc1ca9f
         Events : 8883

    Number   Major   Minor   RaidDevice State
       0       8        5        0      active sync   /dev/sda5
       1       8       21        1      active sync   /dev/sdb5
       2       8       37        2      active sync   /dev/sdc5
       4       8       53        3      spare rebuilding   /dev/sdd5

As for the reshaping activity, it’s interesting to note that it’s (correctly…) involving:

existing three drives (sda, sdb, sdc) with READ activity;
the new drive (sdd) with WRITE activity;
the amount of WRITE is, more or less, the sum of the whole READs

It can be clearly seen with a iostat -d -x 1 100 like this:

[root@srv-backup 23]# iostat -d -x 1 100
Linux 3.10.0-229.20.1.el7.x86_64 (srv-backup) 	12/27/15 	_x86_64_	(2 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               1.90     2.05    0.44    8.66    35.80  4002.87   887.27     1.03  113.00   14.39  118.04   4.14   3.77
sdc               1.90     2.03    0.30    8.41    32.64  3998.11   925.57     0.88  100.63   11.96  103.81   3.45   3.01
sdb               1.90     2.03    0.30    8.39    32.65  3998.01   927.73     0.88  101.38   11.96  104.60   3.47   3.01
md127             0.00     0.00    0.82   25.53    52.05 11971.10   912.82     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     1.91    0.00    0.07     0.00    31.06   871.34     0.01  158.94    2.97  160.45  14.12   0.10

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda            1016.00  1016.00   18.00   10.00  9216.00  4096.50   950.89     0.41   15.82   14.00   19.10   6.61  18.50
sdc            1016.00  1016.00   17.00   10.00  8704.00  4097.00   948.22     0.37   14.44   12.71   17.40   6.22  16.80
sdb            1016.00  1016.00   17.00    9.00  8704.00  4096.50   984.65     1.88   33.58   31.18   38.11  27.27  70.90
md127             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00  1018.00    0.00   64.00     0.00 28704.50   897.02     3.46   54.38    0.00   54.38   5.70  36.50

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda             762.00   508.00   16.00    5.00  8192.00  2048.50   975.29     0.19    9.14    7.38   14.80   6.29  13.20
sdc             762.00   508.00   16.00    5.00  8192.00  2048.50   975.29     0.20    9.29    7.31   15.60   5.67  11.90
sdb             762.00   508.00   16.00    6.00  8192.00  2049.00   931.00     0.34   61.32   40.50  116.83   9.09  20.00
md127             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00   508.00    0.00   40.00     0.00 19460.50   973.02    22.61  308.98    0.00  308.98  22.57  90.30

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda            2286.00  2540.00   34.00   26.00 17408.00 10243.00   921.70     0.89   14.82    9.82   21.35   7.55  45.30
sdc            2286.00  2540.00   35.00   26.00 17920.00 10243.00   923.38     0.93   15.31   10.91   21.23   7.18  43.80
sdb            2286.00  2540.00   35.00   26.00 17920.00 10243.00   923.38     1.31   21.64   15.94   29.31  10.70  65.30
md127             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00  2543.00    0.00   95.00     0.00 42027.00   884.78     4.20  151.13    0.00  151.13   7.23  68.70

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda            1905.00  1905.00   31.00   19.00 15872.00  7682.00   942.16     0.63   12.54    9.16   18.05   6.46  32.30
sdc            1905.00  1905.00   31.00   19.00 15872.00  7682.00   942.16     0.71   14.18   10.87   19.58   6.88  34.40
sdb            1905.00  1905.00   31.00   19.00 15872.00  7682.00   942.16     4.70   93.96   40.16  181.74  15.58  77.90
md127             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00  1906.00    0.00   22.00     0.00  7701.50   700.14     4.35   23.73    0.00   23.73  21.18  46.60

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
md127             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00   31.00     0.00 15364.50   991.26    43.92  881.65    0.00  881.65  32.26 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda             127.00   127.00    1.00    1.00   512.00   512.00  1024.00     0.18    4.00    3.00    5.00  88.50  17.70
sdc             127.00   127.00    1.00    1.00   512.00   512.00  1024.00     0.18    4.00    3.00    5.00  88.50  17.70
sdb             127.00   127.00    1.00    2.00   512.00   512.50   683.00     0.03    8.33    3.00   11.00   8.33   2.50
md127             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00   128.00    0.00   23.00     0.00  9740.50   847.00    15.56 1568.22    0.00 1568.22  39.87  91.70

[...]

That’s all!

Hope this will be useful for someone 😉

DV's blog

SysAdmin, Networking, WebDev and Geek F/OSS IT stuff...

mdadm: growing a 3-disk RAID-0 array to a 4-disk RAID-5 one

Leave a Reply Cancel reply