Mabuhay

Hello world! This is it. I've always wanted to blog. I don't want no fame but just to let myself heard. No! Just to express myself. So, I don't really care if someone believes in what I'm going to write here nor if ever someone gets interested reading it. My blogs may be a novel-like, a one-liner, it doesn't matter. Still, I'm willing to listen to your views, as long as it justifies mine... Well, enjoy your stay, and I hope you'll learn something new because I just did and sharing it with you.. Welcome!

Sunday, June 29, 2008

Reboot after panic: Data page fault

One of the servers that we monitor rebooted on a panic. The dumps, I think, will be submitted to the HPRC for decoding what caused the panic. But, for DPF's, it is most unlikely that it was caused by hardware failure. Probably, some application passed something on the kernel that it didn't know how to process it. Anyway, just an opinion. I'm not sure if I can get anything from the DTS. A root cause analysis is needed for this.

Here are the files on the crash dumps.

[root@box1:/var/adm/crash/crash.0]
# ll
total 2113190
-rw-r--r-- 1 root root 1556 Jun 28 19:05 INDEX
-rw-r--r-- 1 root root 281784 Jun 28 19:02 SEOS
-rw-r--r-- 1 root root 134189056 Jun 28 19:02 image.1.1
-rw-r--r-- 1 root root 134197248 Jun 28 19:02 image.1.2
-rw-r--r-- 1 root root 134152192 Jun 28 19:03 image.1.3
-rw-r--r-- 1 root root 89419776 Jun 28 19:03 image.1.4
-rw-r--r-- 1 root root 134180864 Jun 28 19:04 image.2.1
-rw-r--r-- 1 root root 134180864 Jun 28 19:04 image.2.2
-rw-r--r-- 1 root root 134193152 Jun 28 19:04 image.2.3
-rw-r--r-- 1 root root 134168576 Jun 28 19:05 image.2.4
-rw-r--r-- 1 root root 36978688 Jun 28 19:05 image.2.5
-rw-r--r-- 1 root root 16007272 Jun 28 19:02 vmunix

[root@box1:/var/adm/crash/crash.0]
# more INDEX
comment savecrash crash dump INDEX file
version 2
hostname box1
modelname 9000/800/N4000-44
panic Data page fault
dumptime 1214693427 Sat Jun 28 18:50:27 EDT 2008
savetime 1214694120 Sat Jun 28 19:02:00 EDT 2008
release @(#) $Revision: vmunix: vw: -proj selectors: CUPI80_BL2000_1108 -c 'Vw for CUPI80_BL2000_1108 buil
d' -- cupi80_bl2000_1108 'CUPI80_BL2000_1108' Wed Nov 8 19:24:56 PST 2000 $
memsize 4294967296
chunksize 134217728
module /stand/vmunix vmunix 16007272 3341476060
module /stand/dlkm/mod.d/SEOS SEOS 281784 3144992042
image image.1.1 0x0000000000000000 0x0000000007ff9000 0x0000000000000000 0x0000000000008987 356976270
image image.1.2 0x0000000000000000 0x0000000007ffb000 0x0000000000008988 0x0000000000015447 637338283
image image.1.3 0x0000000000000000 0x0000000007ff0000 0x0000000000015448 0x0000000000066d7f 35130470
image image.1.4 0x0000000000000000 0x0000000005547000 0x0000000000066d80 0x000000000007ffff 3529390633
image image.2.1 0x0000000000000000 0x0000000007ff7000 0x0000000000180000 0x00000000001a7307 719265648
image image.2.2 0x0000000000000000 0x0000000007ff7000 0x00000000001a7308 0x00000000001bec17 3529725656
image image.2.3 0x0000000000000000 0x0000000007ffa000 0x00000000001bec18 0x00000000001ca9df 560273249
image image.2.4 0x0000000000000000 0x0000000007ff4000 0x00000000001ca9e0 0x00000000001f5e0f 3332528375
image image.2.5 0x0000000000000000 0x0000000002344000 0x00000000001f5e10 0x00000000001fffff 3748535493

[root@box1:/var/adm/crash/crash.0]
# uptime
8:29pm up 1:28, 2 users, load average: 0.14, 0.18, 0.11

[root@box1:/var/adm/crash/crash.0]
# date
Sat Jun 28 20:29:31 EDT 2008

[root@box1:/var/adm/crash/crash.0]
# more /etc/shutdownlog
10:02 Mon Oct 10, 2005. Reboot:
11:22 Mon Oct 10, 2005. Reboot: (by SAM)
11:25 Mon Oct 10, 2005. Reboot: (by bdhp4420!root)
06:16 Tue Oct 11, 2005. Reboot: (by bdhp4420!root)
09:56 Thu Oct 20, 2005. Reboot: (by bdhp4420!root)
10:06 Thu Apr 20, 2006. Reboot: (by SAM)
10:12 Thu Apr 20, 2006. Reboot: (by bdhp4420!root)
19:05 Sat Jun 28 2008. Reboot after panic: Data page fault

[root@box1:/var/tombstones]
# ll -rt
total 252
-rw-r--r-- 1 root root 14720 Oct 7 2005 ts93
-rw-r--r-- 1 root root 14720 Oct 10 2005 ts94
-rw-r--r-- 1 root root 14720 Oct 10 2005 ts95
-rw-r--r-- 1 root root 14720 Oct 11 2005 ts96
-rw-r--r-- 1 root root 14720 Oct 20 2005 ts97
-rw-r--r-- 1 root root 14720 Apr 20 2006 ts98
-rw-r--r-- 1 root root 14720 Jun 28 19:02 ts99
-rw-r--r-- 1 root root 20873 Jun 28 19:08 cpumap

[root@box1:/var/tombstones]
#


On the side note, some application processes are not running. I saw earlier that one of them, Oracle, was started manually. So, I guess the same goes with the rest.

UPDATE: The team already opened an HPRC case for this and at the same time, a Problem Record (PR#54269).

World Clock