RAID 5 + 1 hot spare setup
==========================
Notes: 1. The following steps assume the DiskSuite software has been
installed already.
2. There are a lot more Disksuite options available (e.g. different
levels of RAID setup and adding more disks to a hot RAID).
See the DiskSuite Tool manual for details.
--------------------------
1. metadb setup
- Create at least 3 metadb copies on 3 separate disks. Why at least
3 copies? metadb is where raid setup information and logs are stored
and if you lose all your metadbs, your RAID is pretty much hosed. Just
ask John Ward. ;-)
- 5MB partition for each metadb should be enough.
A. as root, run "format" to create 5MB partition for metadb. make
sure you know the exact devices for your RAID setup.
Note: since root slice "s0" is there already (usually 128MB), you
can use it for metadb.
B. enable metadbs.
metadb -c 3 -a -f /dev/dsk/c0t8d0s0 /dev/dsk/c0t9d0s0 /dev/dsk/c0t10d0s0
(-c 3 ==> 3 copies of metadb
-a ==> add metadb, can be used later to add more metadbs w/o -f
-f ==> initial creation of metadb)
*In this case, I am using slice 0 of disks c0t8d0, c0t9d0, and c0t10d0.
2. md.tab file
- Defines how you want to configure your RAID (mirror, stripe, etc)
and how many disks you want in your RAID
- For RAID 5 (5 disks) and 1 hot spare...
A. Edit /etc/lvm/md.tab and add the following to it (d8 and hsp001)
d8 1 5 /dev/dsk/c0t8d0s6 /dev/dsk/c0t9d0s6 /dev/dsk/c0t10d0s6 \
/dev/dsk/c0t11d0s6 /dev/dsk/c0t12d0s6
hsp001 /dev/dsk/c0t13d0s6
Note: md.tab explains what each number means. ("1" signifies all
the disks are on 1 controller, and "5" is how many disks you
want the RAID to be. And hsp001 is my hot spare (You want
to keep this the same size as the other disks)
B. metainit - enable metadevices (RAID data disks) and hot spare
metainit d8
metainit hsp001
3. metatool utililty
run "metatool" tool to make sure you've setup the disks as one
unit properly.
4. create a new filesystem on that RAID data unit
- think of /dev/md/dsk/ is to RAID as /dev/dsk/ is to regular disks.
- since we defined "d8" as the RAID in /etc/lvm/md.tab, to create
a new fs on it type:
newfs -m 0 /dev/md/dsk/d8
(this command takes about 20mins to complete on a Sun Blade 1000
for a 175GB disk)
5. mkdir /raid_disk
6. mount /dev/md/dsk/d8 /raid_disk
7. modify /etc/vfstab to reflect the new RAID
8. as a precaution, I recommend backing up the RAID unit as well.
Example one
In this example, the boot disk mirror c0t1d0 failed. All submirrors on c0t1d0 were placed in "maintenance" state, so no reads or writes were occurring on the disk. The replacement disk has identical geometry of boot disk c0t0d0.
1. Delete any state database replicas from the failed disk. A "W" in metadb output indicates replica device write errors.
# metadb
flags first blk block count
a m p luo 16 8192 /dev/dsk/c0t0d0s6
a p luo 8208 8192 /dev/dsk/c0t0d0s6
W p l 16 8192 /dev/dsk/c0t1d0s6
W p l 8208 8192 /dev/dsk/c0t1d0s6
# metadb -d /dev/dsk/c0t1d0s6
# metadb
flags first blk block count
a m p luo 16 8192 /dev/dsk/c0t0d0s6
a p luo 8208 8192 /dev/dsk/c0t0d0s6
2. Replace the failed disk.
3. Copy the partition table from the good disk to the replacement disk.
# prtvtoc /dev/rdsk/c0t0d0s2 | fmthard -s - /dev/rdsk/c0t1d0s2
4. Create state database replicas on the replacement disk.
# metadb -a -c 2 /dev/dsk/c0t1d0s6
5. Determine which submirrors need to be resynchronized.
# metastat | grep 'Invoke: metareplace'
Invoke: metareplace d30 c0t1d0s0
Invoke: metareplace d31 c0t1d0s1
Invoke: metareplace d33 c0t1d0s3
Invoke: metareplace d34 c0t1d0s4
Invoke: metareplace d35 c0t1d0s5
6. Resynchronize the submirrors.
# ./metareplace -e d30 c0t1d0s0
d30: device c0t1d0s0 is enabled
# ./metareplace -e d31 c0t1d0s1
d31: device c0t1d0s1 is enabled
# ./metareplace -e d33 c0t1d0s3
d32: device c0t1d0s3 is enabled
# ./metareplace -e d34 c0t1d0s4
d33: device c0t1d0s4 is enabled
# ./metareplace -e d35 c0t1d0s5
d34: device c0t1d0s5 is enabled
You can monitor the resynchronization progress with metastat or can use command.
While true
do metastat | grep %
sleep 3
done
7. install boot blk
installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c1t5d0s0
8. setting secondary boot disk
#init 0
Ok printenv
Ok setenv boot-device disk disk5 net
Ok reset-all
Example two
In this example, the mirror disk is failing, but one of its submirrors is in an Okay state.
# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c1t0d0
/pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w21000004cf222bde,0
1. c1t1d0
/pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w2100002037f3ce1d,0
# metastat
...
d20: Submirror of d0
State: Needs maintenance
Invoke: metareplace d0 c1t1d0s0
Size: 3073896 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c1t1d0s0 0 No Maintenance
d21: Submirror of d1
State: Okay
Size: 8389656 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c1t1d0s1 0 No Okay
d23: Submirror of d3
State: Needs maintenance
Invoke: metareplace d3 c1t1d0s3
Size: 525798 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c1t1d0s3 0 No Maintenance
d24: Submirror of d4
State: Needs maintenance
Invoke: metareplace d4 c1t1d0s4
Size: 59114718 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c1t1d0s4 0 No Maintenance
Since submirror d21 is still in an Okay state, it must be detached from the mirror.
# metadetach d1 d21
d1: submirror d21 is detached
Remaining steps:
1. Delete the state databases from the failed disk.
2. Replace the failed disk.
3. Duplicate the partition table from the good disk to the new disk.
4. Re-create the state databases.
5. Run metareplace on the metadevices that were in "Needs maintenance" state (i.e., d20, d23, and d24 in this example).
6. Run metattach on the detached metadevices (i.e., d21 in this example).
In this example, the boot disk mirror c0t1d0 failed. All submirrors on c0t1d0 were placed in "maintenance" state, so no reads or writes were occurring on the disk. The replacement disk has identical geometry of boot disk c0t0d0.
1. Delete any state database replicas from the failed disk. A "W" in metadb output indicates replica device write errors.
# metadb
flags first blk block count
a m p luo 16 8192 /dev/dsk/c0t0d0s6
a p luo 8208 8192 /dev/dsk/c0t0d0s6
W p l 16 8192 /dev/dsk/c0t1d0s6
W p l 8208 8192 /dev/dsk/c0t1d0s6
# metadb -d /dev/dsk/c0t1d0s6
# metadb
flags first blk block count
a m p luo 16 8192 /dev/dsk/c0t0d0s6
a p luo 8208 8192 /dev/dsk/c0t0d0s6
2. Replace the failed disk.
3. Copy the partition table from the good disk to the replacement disk.
# prtvtoc /dev/rdsk/c0t0d0s2 | fmthard -s - /dev/rdsk/c0t1d0s2
4. Create state database replicas on the replacement disk.
# metadb -a -c 2 /dev/dsk/c0t1d0s6
5. Determine which submirrors need to be resynchronized.
# metastat | grep 'Invoke: metareplace'
Invoke: metareplace d30 c0t1d0s0
Invoke: metareplace d31 c0t1d0s1
Invoke: metareplace d33 c0t1d0s3
Invoke: metareplace d34 c0t1d0s4
Invoke: metareplace d35 c0t1d0s5
6. Resynchronize the submirrors.
# ./metareplace -e d30 c0t1d0s0
d30: device c0t1d0s0 is enabled
# ./metareplace -e d31 c0t1d0s1
d31: device c0t1d0s1 is enabled
# ./metareplace -e d33 c0t1d0s3
d32: device c0t1d0s3 is enabled
# ./metareplace -e d34 c0t1d0s4
d33: device c0t1d0s4 is enabled
# ./metareplace -e d35 c0t1d0s5
d34: device c0t1d0s5 is enabled
You can monitor the resynchronization progress with metastat or can use command.
While true
do metastat | grep %
sleep 3
done
7. install boot blk
installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c1t5d0s0
8. setting secondary boot disk
#init 0
Ok printenv
Ok setenv boot-device disk disk5 net
Ok reset-all
Example two
In this example, the mirror disk is failing, but one of its submirrors is in an Okay state.
# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c1t0d0
/pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w21000004cf222bde,0
1. c1t1d0
/pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w2100002037f3ce1d,0
# metastat
...
d20: Submirror of d0
State: Needs maintenance
Invoke: metareplace d0 c1t1d0s0
Size: 3073896 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c1t1d0s0 0 No Maintenance
d21: Submirror of d1
State: Okay
Size: 8389656 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c1t1d0s1 0 No Okay
d23: Submirror of d3
State: Needs maintenance
Invoke: metareplace d3 c1t1d0s3
Size: 525798 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c1t1d0s3 0 No Maintenance
d24: Submirror of d4
State: Needs maintenance
Invoke: metareplace d4 c1t1d0s4
Size: 59114718 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c1t1d0s4 0 No Maintenance
Since submirror d21 is still in an Okay state, it must be detached from the mirror.
# metadetach d1 d21
d1: submirror d21 is detached
Remaining steps:
1. Delete the state databases from the failed disk.
2. Replace the failed disk.
3. Duplicate the partition table from the good disk to the new disk.
4. Re-create the state databases.
5. Run metareplace on the metadevices that were in "Needs maintenance" state (i.e., d20, d23, and d24 in this example).
6. Run metattach on the detached metadevices (i.e., d21 in this example).
Subscribe to:
Posts (Atom)