Friday, May 15, 2009

How to locate a missing disk within several JBOD.... dd

IHAC dropped an email wondering if one of the disk has missing from server configuration. An email sent with "format" command output as:

Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c0t0d0
/pci@1c,600000/scsi@1/sd@0,0
1. c0t1d0
/pci@1c,600000/scsi@1/sd@1,0
2. c0t2d0
/pci@1c,600000/scsi@1/sd@2,0
3. c0t3d0
/pci@1c,600000/scsi@1/sd@3,0
4. c0t4d0
/pci@1c,600000/scsi@1/sd@4,0
5. c0t5d0
/pci@1c,600000/scsi@1/sd@5,0
6. c1t0d0
/pci@1c,600000/scsi@1,1/sd@0,0
7. c1t1d0
/pci@1c,600000/scsi@1,1/sd@1,0
8. c1t2d0
/pci@1c,600000/scsi@1,1/sd@2,0
9. c1t3d0
/pci@1c,600000/scsi@1,1/sd@3,0
10. c1t4d0
/pci@1c,600000/scsi@1,1/sd@4,0
11. c1t5d0
/pci@1c,600000/scsi@1,1/sd@5,0
12. c3t0d0
/pci@1f,700000/scsi@2/sd@0,0
13. c3t1d0
/pci@1f,700000/scsi@2/sd@1,0
14. c4t0d0
/pci@1d,700000/scsi@1/sd@0,0
15. c4t1d0
/pci@1d,700000/scsi@1/sd@1,0
16. c4t2d0
/pci@1d,700000/scsi@1/sd@2,0
17. c4t3d0
/pci@1d,700000/scsi@1/sd@3,0
18. c4t4d0
/pci@1d,700000/scsi@1/sd@4,0
19. c4t5d0
/pci@1d,700000/scsi@1/sd@5,0
20. c5t0d0
/pci@1d,700000/scsi@1,1/sd@0,0
21. c5t1d0
/pci@1d,700000/scsi@1,1/sd@1,0
22. c5t2d0
/pci@1d,700000/scsi@1,1/sd@2,0
23. c5t3d0
/pci@1d,700000/scsi@1,1/sd@3,0
24. c5t4d0
/pci@1d,700000/scsi@1,1/sd@4,0
25. c5t5d0
/pci@1d,700000/scsi@1,1/sd@5,0
26. c7t0d0
/pci@1d,700000/scsi@2/sd@0,0
27. c7t1d0
/pci@1d,700000/scsi@2/sd@1,0
28. c7t2d0
/pci@1d,700000/scsi@2/sd@2,0
29. c7t3d0
/pci@1d,700000/scsi@2/sd@3,0
30. c7t4d0
/pci@1d,700000/scsi@2/sd@4,0
31. c7t5d0
/pci@1d,700000/scsi@2/sd@5,0
32. c7t8d0
/pci@1d,700000/scsi@2/sd@8,0
33. c7t9d0
/pci@1d,700000/scsi@2/sd@9,0
34. c7t10d0
/pci@1d,700000/scsi@2/sd@a,0
35. c7t11d0
/pci@1d,700000/scsi@2/sd@b,0
36. c7t13d0
/pci@1d,700000/scsi@2/sd@d,0
37. c8t0d0
/pci@1d,700000/scsi@2,1/sd@0,0
38. c8t1d0
/pci@1d,700000/scsi@2,1/sd@1,0
39. c8t2d0
/pci@1d,700000/scsi@2,1/sd@2,0
40. c8t3d0
/pci@1d,700000/scsi@2,1/sd@3,0
41. c8t4d0
/pci@1d,700000/scsi@2,1/sd@4,0
42. c8t5d0
/pci@1d,700000/scsi@2,1/sd@5,0
43. c8t8d0
/pci@1d,700000/scsi@2,1/sd@8,0
44. c8t9d0
/pci@1d,700000/scsi@2,1/sd@9,0
45. c8t10d0
/pci@1d,700000/scsi@2,1/sd@a,0
46. c8t11d0
/pci@1d,700000/scsi@2,1/sd@b,0
47. c8t12d0
/pci@1d,700000/scsi@2,1/sd@c,0
48. c8t13d0
/pci@1d,700000/scsi@2,1/sd@d,0


Phew.. Which one is missing?
Understanding the log, its obvious that disk c7t12d0 was missing.

35. c7t11d0
/pci@1d,700000/scsi@2/sd@b,0

==>> c7t12d0
==>> /pci@1d,700000/scsi@2/sd@c,0
36. c7t13d0
/pci@1d,700000/scsi@2/sd@d,0


Next question, where is the physical disk? In which tray? In which slot? We need to fixed it without server downtime!

root@sun1 # ./decode
Enter path? /pci@1d,700000/scsi@2/sd@c,0
Possible machines:
0. Sun Fire V440
1. Sun Fire V250
Selection
[default: V440]: 0
Sun Fire V440
System Board
PCI Slot 2 3.3V 33/66Mhz 32/64Bit PCI
Node: scsi@2
Desc: Dual Differential Ultra/Wide SCSI Host Adapter
Desc: Dual Single-Ended Ultra/Wide SCSI Host Adapter
Desc: PCI-X - Dual Ultra-320 SCSI/RAID
Desc: QLogic QLA-22xx PCI Fibre Channel Adaptor
Port: SCSI Channel A
Info: VHDC168 Connector - SE/LVD Ultra-320 SCSI
Node:
Desc: Storage Array
Node: sd@c,0
Desc: SCSI Emulated FCAL Disk Device
Location: LUN 0


Now we have missing disk information in hand:
- device path: c7t12d0
- SCSI connection: PCI Slot 2 <=> Port: SCSI Channel A <=> Storage Array (Node: sd@c,0)
- adjacent disk c7t11d0 and c7t13d0;
root@sun1: #iostat -En
<..truncated..>
c7t11d0 Soft Errors: 0 Hard Errors: 1 Transport Errors: 0
Vendor: FUJITSU Product: MAX3073NCSUN72G Revision: 1503 Serial No: 0714F03WY1
c7t13d0 Soft Errors: 0 Hard Errors: 1 Transport Errors: 0
Vendor: SEAGATE Product: ST373455LSUN72G Revision: 0491 Serial No: 0717R0WPXR

<..truncated..>

However, even though we have enough data, still we can't found the missing disk. There were 2 server (in cluster mode), connected to 4 disk trays (SE3310) through 12 rigid SCSI cables. The worse, there were no label nor cable tag on all SCSI cable. Then we locate the disk by simulating I/O activity on disk c7t11d0 and c7t13d0, since we understand that the missing disk was sitting in between those disks.

On terminal window execute;
root@sun1 # dd if=/dev/rdsk/c7t11d0s2 of=/dev/null
and on another terminal window run the same command concurrently;
root@sun1 # dd if=/dev/rdsk/c7t13d0s2 of=/dev/null

Now we look for disk drives which have extensive I/O activity, representing by "steadily blinking LED" on it. Finally we found the c7t12d0 disk which is the disk drive without any activity and sitting in slot in between disk c7t11d0 and c7t13d0. Then proceed with normal disk replacement procedure.


No comments:

Post a Comment