It is a normal procedure to replace faulty disk while system is up and running. By pull-out the faulty disk, and insert the replacement disk, the new disk can be recognized by the OS. If Solaris Volume Manager (SVM) is being used, then we may need to proceed with resync or rebuild the broken sub-mirror. However, it is not an appropriate way. Even though the new disk has been recognized by OS and SVM went through with rebuild mirror, it will left problem behind. With "iostat -En" command, the disk serial number doesn't indicated that the disk has been replaced. The old disk serial number still intact.
root@sun1: # iostat -Ensnip
c0t1d0 Soft Errors: 1 Hard Errors: 42 Transport Errors: 10 Vendor: SEAGATE Product: ST373207LSUN72G Revision: 045A Serial No: 053432A5HL Size: 73.40GB <73400057856> Media Error: 36 Device Not Ready: 0 No Device: 6 Recoverable: 1 Illegal Request: 0 Predictive Failure Analysis:0
snip
Note:
Serial No: 053432A5HL was an old disk serial number.
Furthermore when performing system reboot the errors below may be seen:
Sep 22 10:15:25 sun1 metadevadm: [ID 209699 daemon.error] Invalid device relocation information detected in Solaris Volume Manager
and as a result, the sub-mirror broke again, and manual resync (rebuild mirror) should be perform. After system has been rebooted, the new disk shown its correct serial number;Sep 22 10:15:25 sun1 metadevadm: [ID 209699 daemon.error] Invalid device relocation information detected in Solaris Volume Manager
root@sun1: # iostat -En
snip
c0t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: FUJITSU Product: MAW3073NCSUN72G Revision: 1703 Serial No: 0749B0PDPJ Size: 73.40GB <73400057856> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis:0
snip
Note:
Serial No: 0749B0PDPJ is a new disk serial number
Following is an illustrated how to replace a faulty disk.
Configuration:
- all disk are SCSI disks
- disks mirror by Solaris Volume Manager
- each sub-mirror protected by hot spare disk (hot spare pool)
- faulty disk was c1t0d0, and the other sub-mirror disk was c0t0d0
The following steps and set of commands can be followed during disk replacement:
1. Check whether there are any replicas on faulty disk, then remove them if any;
root@sun1: metadb
root@sun1: metadb -d c1t0d0s7
Verify if there are no existing replicas left on faulty disk;
root@sun1: # metadb | grep c1t0d0
2. Run "cfgadm" command to remove the failed disk.
root@sun1: # cfgadm -c unconfigure c1::dsk/c1t0d0
3. Insert and configure the new disk.
root@sun1: # cfgadm -c configure c1::dsk/c1t0d0
Verify that disk is properly configured;
root@sun1: # cfgadm -al
If necessary run following related disk commands:
root@sun1: # devfsadm
root@sun1: # format (verifying new disk)
4. Create desired partition table on the new disk with prtvtoc command;
root@sun1: # prtvtoc /dev/rdsk/c0t0d0s2 | fmthard -s - /dev/rdsk/c1t0d0s2
5. Recreate replicas on new disk:
root@sun1: # metadb -a c1t0d0s7
6. Run metareplace to enable and resync the new disk.
metareplace -e (mirror-md) cxtyd0sz
root@sun1: # metareplace -e d20 c1t0d0s0
7. In case SVM device-id not up-to-date, run "metadevadm" which will update the new disk device-id.
root@sun1: # metadevadm -u c1t0d0Reference:
sunsolve.sun.com
=================================================================
Document ID: 208671
Title: Solaris Volume Manager software: Replacing Disks
=================================================================
Description
Beginning with the Solaris[TM] 9 Operating System, Solaris[TM] Volume Manager(VM) software uses a new feature called Device-ID (DevID). This feature identifies each disk not only by its c#t#d# name, but by a unique ID which is generated by the disk's WWN or serial number.
Solaris Volume Manager(VM) relies on the Solaris OS to supply it with each disk's correct DevID. When a disk fails and is replaced, a specific procedure is required for disks to make sure thatSolaris OS is updated with the new disk's DevID.
If this procedure is not followed exactly, the errors below may be seen:Beginning with the Solaris[TM] 9 Operating System, Solaris[TM] Volume Manager(VM) software uses a new feature called Device-ID (DevID). This feature identifies each disk not only by its c#t#d# name, but by a unique ID which is generated by the disk's WWN or serial number.
Solaris Volume Manager(VM) relies on the Solaris OS to supply it with each disk's correct DevID. When a disk fails and is replaced, a specific procedure is required for disks to make sure thatSolaris OS is updated with the new disk's DevID.
Jun 22 18:22:57 host1 metadevadm: [ID 209699 daemon.error] Invalid device relocation information detected in Solaris Volume Manager
As a result, Solaris OS will not update the DevID until the next reboot, meaning that although a NEW disk is in the system, the DevID being reported by Solaris OS to the Solaris VM software is still the OLD disk's DevID.(..truncated..)
Sewaktu ganti disk, semuanya berjalan normal. Tapi saat cek serial number pake iostat -En kok serial number nya masih punya disk yang lama?. Bingung dech. Beberapa hari kemudian saat server di reboot, ternyata disk yang baru diganti "lepas" lagi dari mirror-nya. Perlu di resync ulang. Wah agak mubasir lemburan yang kemarin. Tapi lembur ya tetap lembur.... lumayan.
No comments:
Post a Comment