Mabuhay

Hello world! This is it. I've always wanted to blog. I don't want no fame but just to let myself heard. No! Just to express myself. So, I don't really care if someone believes in what I'm going to write here nor if ever someone gets interested reading it. My blogs may be a novel-like, a one-liner, it doesn't matter. Still, I'm willing to listen to your views, as long as it justifies mine... Well, enjoy your stay, and I hope you'll learn something new because I just did and sharing it with you.. Welcome!

Monday, July 21, 2008

LUN: POWERFAILED

I did encounter this error eons ago but never really had the chance to write about it.

Anyway, I'll be using some inputs provided by my colleagues. Please take note that this is merely for conversion or identifying the disk that have a power failure.

From the /var/adm/syslog.log:
...
Jul 20 21:54:50 server4385 vmunix: LVM: Performed a switch for Lun ID = 0 (pv = 0x0000000048753840), from raw device 0x1f060100 (with priority: 0, and current flags: 0x40) to raw device 0x1f078100 (with priority: 1, and current flags: 0x0).
Jul 20 21:54:50 server4385 vmunix: LVM: Performed a switch for Lun ID = 0 (pv = 0x0000000048753840), from raw device 0x1f060100 (with priority: 0, and current flags: 0x40) to raw device 0x1f078100 (with priority: 1, and current flags: 0x0).
Jul 20 21:54:50 server4385 vmunix: LVM: Restored PV 1 to VG 1.
Jul 20 21:54:50 server4385 vmunix: LVM: Restored PV 1 to VG 1.
Jul 20 21:54:54 server385 vmunix: LVM: vg[1]: pvnum=1 (dev_t=0x1f078100) is POWERFAILED
Jul 20 21:54:54 server4385 vmunix: LVM: vg[1]: pvnum=1 (dev_t=0x1f078100) is POWERFAILED
Jul 20 21:55:04 server4385 vmunix: LVM: Recovered Path (device 0x1f060100) to PV 1 in VG 1.
Jul 20 21:55:04 server4385 vmunix: LVM: Recovered Path (device 0x1f060100) to PV 1 in VG 1.
Jul 20 21:55:04 server4385 vmunix: LVM: Performed a switch for Lun ID = 0 (pv = 0x0000000048753840), from raw device 0x1f078100 (with priority: 1, and current flags: 0x0) to raw device 0x1f060100 (with priority: 0, and current flags: 0x80).
Jul 20 21:55:04 server4385 vmunix: LVM: Performed a switch for Lun ID = 0 (pv = 0x0000000048753840), from raw device 0x1f078100 (with priority: 1, and current flags: 0x0) to raw device 0x1f060100 (with priority: 0, and current flags: 0x80).
...


We'll use the entry "dev_t=0x1f078100". To convert to the exact disk, take the last six (6) digits, e.g., 078100, and check from /dev/dsk:

# ll /dev/dsk | grep 078100
brw-r----- 1 bin sys 31 0x078100 Apr 19 23:47 c7t8d1

Now, this represents the device file for the disk. Since this is being used for LVM, we can use pvdisplay, vgdisplay, or lvdisplay to check partly on the status of the data written on it.

1 comment:

  1. Another error was generated on this [from syslog].

    Aug 6 21:50:14 server1009 EMS [3974]: ------ EMS Event Notification ------ Value: "CRITICAL (5)" for Resource: "/storage/events/disks/default/1_0_1_0_0_1_1.6.0" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 260440069 -r /storage/events/disks/default/1_0_1_0_0_1_1.6.0 -n 260440065 -a
    Aug 6 21:50:14 server1009 EMS [3974]: ------ EMS Event Notification ------ Value: "CRITICAL (5)" for Resource: "/storage/events/disks/default/1_0_1_0_0_1_1.6.0" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 260440069 -r /storage/events/disks/default/1_0_1_0_0_1_1.6.0 -n 260440065 -a
    Aug 6 21:50:16 server1009 vmunix: LVM: vg[0]: pvnum=1 (dev_t=0x1f076000) is POWERFAILED
    Aug 6 21:50:16 server1009 vmunix: LVM: vg[0]: pvnum=1 (dev_t=0x1f076000) is POWERFAILED
    Aug 6 21:50:17 server1009 EMS [3974]: ------ EMS Event Notification ------ Value: "MAJORWARNING (3)" for Resource: "/storage/events/disks/default/1_0_1_0_0_1_1.6.0" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 260440069 -r /storage/events/disks/default/1_0_1_0_0_1_1.6.0 -n 260440066 -a
    Aug 6 21:50:17 server1009 EMS [3974]: ------ EMS Event Notification ------ Value: "MAJORWARNING (3)" for Resource: "/storage/events/disks/default/1_0_1_0_0_1_1.6.0" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 260440069 -r /storage/events/disks/default/1_0_1_0_0_1_1.6.0 -n 260440066 -a



    In addition to the previous steps I took, we can have these:

    # ioscan -fnCdisk
    Class I H/W Path Driver S/W State H/W Type Description
    ==========================================================================
    disk 18 0/0/6/0/0.2.196.0.0.0.0 sdisk CLAIMED DEVICE HP OPEN-V
    /dev/dsk/c28t0d0 /dev/rdsk/c28t0d0
    disk 19 0/0/6/0/0.2.196.0.0.0.1 sdisk CLAIMED DEVICE HP OPEN-V
    /dev/dsk/c28t0d1 /dev/rdsk/c28t0d1
    disk 20 0/0/6/0/0.2.196.0.0.0.2 sdisk CLAIMED DEVICE HP OPEN-V
    /dev/dsk/c28t0d2 /dev/rdsk/c28t0d2
    ....
    disk 1 1/0/1/0/0/1/1.6.0 sdisk CLAIMED DEVICE HP 36.4GST336752LC
    /dev/dsk/c7t6d0 /dev/rdsk/c7t6d0
    ....

    Or:

    # ioscan -H 1/0/1/0/0/1/1.6.0
    H/W Path Class Description
    ===========================================================
    1/0/1/0/0/1/1.6.0 disk HP 36.4GST336752LC


    Also, we can run the "resmon" report generated - a system monitoring tool - like the one we have below:


    # /opt/resmon/bin/resdata -R 260440069 -r /storage/events/disks/default/1_0_1_0_0_1_1.6.0 -n 260440066 -a

    CURRENT MONITOR DATA:

    Event Time..........: Wed Aug 6 21:50:17 2008
    Severity............: MAJORWARNING
    Monitor.............: disk_em
    Event #.............: 100091
    System..............: server1009

    Summary:
    Disk at hardware path 1/0/1/0/0/1/1.6.0 : Software configuration error


    Description of Error:

    The device is in a condition where it requires action on the part of the
    device driver or a human operator.

    Probable Cause / Recommended Action:

    The device has been reset by a Bus Device Reset message, a hard reset
    condition, or a power-on reset.

    If this is the case, no action is necessary.
    ...


    For more info on the system events, please do check:

    http://docs.hp.com/en/diag/ems/scsi.htm#E100091

    ReplyDelete

World Clock