ilinch
4800经常down机故障信息,大家帮我看看,怀疑sb板子或则fan有问题
串口输出的
You have new mail.
Jan 31 00:04:58 SF4800-2-SC0 Platform.SC: Notice: /N0/SB4 temperature is approaching warning limit of 100C.
Jan 31 00:04:58 SF4800-2-SC0 Platform.SC: /N0/SB4 SDC 0 Temp. 0 value: 96 Degrees C
EJMAIN2% Jan 31 00:10:53 SF4800-2-SC0 Platform.SC: WARNING: /N0/SB4 temperature is approaching max limit of 93C
Jan 31 00:10:53 SF4800-2-SC0 Platform.SC: /N0/SB4 CPU 3 Temp. 0 value: 88 Degrees C
Jan 31 00:10:53 SF4800-2-SC0 Platform.SC: Check for abnormal environmental operating conditions.
Jan 31 00:10:53 SF4800-2-SC0 Platform.SC: /N0/SB4, sensor status, outside acceptable limits (7,1,0x204040603030000)
Jan 31 00:12:31 SF4800-2-SC0 Platform.SC: WARNING: /N0/SB4 temperature is approaching max limit of 93C
Jan 31 00:12:31 SF4800-2-SC0 Platform.SC: /N0/SB4 CPU 0 Temp. 0 value: 88 Degrees C
Jan 31 00:12:31 SF4800-2-SC0 Platform.SC: Check for abnormal environmental operating conditions.
Jan 31 00:12:31 SF4800-2-SC0 Platform.SC: /N0/SB4, sensor status, outside acceptable limits (7,1,0x204040600030000)
Jan 31 00:22:06 SF4800-2-SC0 Platform.SC: WARNING: /N0/SB4 temperature is approaching max limit of 105C
Jan 31 00:22:06 SF4800-2-SC0 Platform.SC: /N0/SB4 SDC 0 Temp. 0 value: 101 Degrees C
Jan 31 00:22:06 SF4800-2-SC0 Platform.SC: Check for abnormal environmental operating conditions.
Jan 31 00:22:06 SF4800-2-SC0 Platform.SC: /N0/SB4, sensor status, outside acceptable limits (7,1,0x204040200030000)
Jan 31 00:25:59 SF4800-2-SC0 Platform.SC: WARNING: /N0/SB4 temperature is approaching max limit of 93C
Jan 31 00:25:59 SF4800-2-SC0 Platform.SC: /N0/SB4 CPU 2 Temp. 0 value: 88 Degrees C
Jan 31 00:25:59 SF4800-2-SC0 Platform.SC: Check for abnormal environmental operating conditions.
Jan 31 00:25:59 SF4800-2-SC0 Platform.SC: /N0/SB4, sensor status, outside acceptable limits (7,1,0x204040602030000)
logout
EJMAIN2 console login: nari
Password:
Last login: Wed Jan 30 16:35:15 from ejop1
Sun Microsystems Inc. SunOS 5.9 Generic May 2002
You have new mail.
EJMAIN2% Jan 31 12:05:49 SF4800-2-SC0 Platform.SC: Notice: Shutting down /N0/SB4 as temperature exceeds max limit of 93C
Jan 31 12:05:49 SF4800-2-SC0 Platform.SC: /N0/SB4 CPU 3 Temp. 0 value: 93 Degrees C
Jan 31 12:05:49 SF4800-2-SC0 Platform.SC: Check for abnormal environmental operating conditions.
Jan 31 12:05:50 SF4800-2-SC0 Platform.SC: /N0/SB4: has been queued for power off.
SF4800-2-SC0:A> Jan 31 12:05:50 SF4800-2-SC0 Domain-A.SC: /N0/SB4: powering off active board
Jan 31 12:05:50 SF4800-2-SC0 Domain-A.SC: changing domain A keyswitch position to standby
Jan 31 12:05:50 SF4800-2-SC0 Platform.SC: /N0/SB4, sensor status, shutdown (7,3,0x204040603030000)
Jan 31 12:06:01 SF4800-2-SC0 Platform.SC: /N0/SB4: powered off
Jan 31 12:08:13 SF4800-2-SC0 Platform.SC: FT0, fan speed, Low (4,1)
Jan 31 12:08:13 SF4800-2-SC0 Platform.SC: FT2, fan speed, Low (4,1)
Jan 31 12:08:13 SF4800-2-SC0 Platform.SC: FT1, fan speed, Low (4,1)
You have new mail.
EJMAIN2% Feb 01 10:15:35 SF4800-2-SC0 Platform.SC: Notice: /N0/SB4 temperature is approaching warning limit of 100C.
Feb 01 10:15:35 SF4800-2-SC0 Platform.SC: /N0/SB4 SDC 0 Temp. 0 value: 96 Degrees C
Feb 01 10:20:07 SF4800-2-SC0 Platform.SC: Notice: /N0/SB4 temperature is approaching warning limit of 88C.
Feb 01 10:20:07 SF4800-2-SC0 Platform.SC: /N0/SB4 CPU 1 Temp. 0 value: 83 Degrees C
Feb 01 10:20:15 SF4800-2-SC0 Platform.SC: WARNING: /N0/SB4 temperature is approaching max limit of 93C
Feb 01 10:20:15 SF4800-2-SC0 Platform.SC: /N0/SB4 CPU 3 Temp. 0 value: 88 Degrees C
Feb 01 10:20:15 SF4800-2-SC0 Platform.SC: Check for abnormal environmental operating conditions.
Feb 01 10:20:15 SF4800-2-SC0 Platform.SC: /N0/SB4, sensor status, outside acceptable limits (7,1,0x204040603030000)
Feb 01 10:24:01 SF4800-2-SC0 Platform.SC: WARNING: /N0/SB4 temperature is approaching max limit of 93C
Feb 01 10:24:01 SF4800-2-SC0 Platform.SC: /N0/SB4 CPU 0 Temp. 0 value: 88 Degrees C
Feb 01 10:24:01 SF4800-2-SC0 Platform.SC: Check for abnormal environmental operating conditions.
Feb 01 10:24:01 SF4800-2-SC0 Platform.SC: /N0/SB4, sensor status, outside acceptable limits (7,1,0x204040600030000)
Feb 01 10:33:20 SF4800-2-SC0 Platform.SC: WARNING: /N0/SB4 temperature is approaching max limit of 105C
Feb 01 10:33:20 SF4800-2-SC0 Platform.SC: /N0/SB4 SDC 0 Temp. 0 value: 101 Degrees C
Feb 01 10:33:20 SF4800-2-SC0 Platform.SC: Check for abnormal environmental operating conditions.
Feb 01 10:33:20 SF4800-2-SC0 Platform.SC: WARNING: /N0/SB4 temperature is approaching max limit of 93C
Feb 01 10:33:20 SF4800-2-SC0 Platform.SC: /N0/SB4 CPU 2 Temp. 0 value: 88 Degrees C
Feb 01 10:33:20 SF4800-2-SC0 Platform.SC: Check for abnormal environmental operating conditions.
Feb 01 10:33:20 SF4800-2-SC0 Platform.SC: /N0/SB4, sensor status, outside acceptable limits (7,1,0x204040200030000)
Feb 01 10:33:20 SF4800-2-SC0 Platform.SC: /N0/SB4, sensor status, outside acceptable limits (7,1,0x204040602030000)
Feb 01 14:28:13 SF4800-2-SC0 Platform.SC: Notice: Shutting down /N0/SB4 as temperature exceeds max limit of 93C
Feb 01 14:28:13 SF4800-2-SC0 Platform.SC: /N0/SB4 CPU 3 Temp. 0 value: 93 Degrees C
Feb 01 14:28:13 SF4800-2-SC0 Platform.SC: Check for abnormal environmental operating conditions.
SF4800-2-SC0:A> Feb 01 14:28:13 SF4800-2-SC0 Platform.SC: /N0/SB4: has been queued for power off.
Feb 01 14:28:13 SF4800-2-SC0 Domain-A.SC: /N0/SB4: powering off active board
Feb 01 14:28:13 SF4800-2-SC0 Domain-A.SC: changing domain A keyswitch position to standby
Feb 01 14:28:13 SF4800-2-SC0 Platform.SC: /N0/SB4, sensor status, shutdown (7,3,0x204040603030000)
Feb 01 14:28:24 SF4800-2-SC0 Platform.SC: /N0/SB4: powered off
Feb 01 14:30:36 SF4800-2-SC0 Platform.SC: FT0, fan speed, Low (4,1)
Feb 01 14:30:36 SF4800-2-SC0 Platform.SC: FT2, fan speed, Low (4,1)
Feb 01 14:30:36 SF4800-2-SC0 Platform.SC: FT1, fan speed, Low (4,1)
[[i] 本帖最后由 ilinch 于 2008-2-3 12:12 编辑 [/i]]
ilinch
prtdiag -v看的情况 ,没有发现硬件错误
System Configuration: Sun Microsystems sun4u Sun Fire 4800
System clock frequency: 150 MHz
Memory size: 4096 Megabytes
========================= CPUs ===============================================
CPU Run E$ CPU CPU
FRU Name ID MHz MB Impl. Mask
---------- ------- ---- ---- ------- ----
/N0/SB4/P0 16 1200 8.0 US-III+ 11.0
/N0/SB4/P1 17 1200 8.0 US-III+ 11.0
/N0/SB4/P2 18 1200 8.0 US-III+ 11.0
/N0/SB4/P3 19 1200 8.0 US-III+ 11.0
========================= Memory Configuration ===============================
Logical Logical Logical
Port Bank Bank Bank DIMM Interleave Interleave
FRU Name ID Num Size Status Size Factor Segment
------------- ---- ---- ------ ----------- ------ ---------- ----------
/N0/SB4/P0/B0 16 0 512MB pass 256MB 8-way 0
/N0/SB4/P0/B0 16 2 512MB pass 256MB 8-way 0
/N0/SB4/P1/B0 17 0 512MB pass 256MB 8-way 0
/N0/SB4/P1/B0 17 2 512MB pass 256MB 8-way 0
/N0/SB4/P2/B0 18 0 512MB pass 256MB 8-way 0
/N0/SB4/P2/B0 18 2 512MB pass 256MB 8-way 0
/N0/SB4/P3/B0 19 0 512MB pass 256MB 8-way 0
/N0/SB4/P3/B0 19 2 512MB pass 256MB 8-way 0
========================= IO Cards =========================
Bus Max
IO Port Bus Freq Bus Dev,
FRU Name Type ID Side Slot MHz Freq Func State Name Model
---------- ---- ---- ---- ---- ---- ---- ---- ----- -------------------------------- ----------------------
/N0/IB6/P0 PCI 24 B 0 33 33 1,0 ok network-pci108e,abba.11 SUNW,pci-ce
/N0/IB6/P0 PCI 24 A 3 66 66 1,0 ok pci-pci8086,b154.0/network (netw+ pci-bridge
/N0/IB6/P0 PCI 24 A 3 66 66 0,0 ok network-pci108e,abba.20 SUNW,pci-ce
/N0/IB6/P0 PCI 24 A 3 66 66 1,0 ok network-pci108e,abba.20 SUNW,pci-ce
/N0/IB6/P0 PCI 24 A 3 66 66 2,0 ok scsi-pci1000,b.1000.1000.7/disk +
/N0/IB6/P0 PCI 24 A 3 66 66 2,1 ok scsi-pci1000,b.1000.1000.7/disk +
/N0/IB6/P1 PCI 25 A 7 66 66 1,0 ok SUNW,qlc-pci1077,2300.1077.106.1+ 0x106
/N0/IB8/P0 PCI 28 A 3 66 66 1,0 ok SUNW,qlc-pci1077,2300.1077.106.1+ 0x106
/N0/IB8/P1 PCI 29 A 7 66 66 1,0 ok network-pci108e,abba.11 SUNW,pci-ce
========================= Active Boards for Domain ===========================
Board Receptacle Occupant
FRU Name Type Status Status Condition Info
--------- ----------- ----------- ------------ --------- ----------------------------------------
/N0/SB4 CPU_V2 connected configured ok powered-on, assigned
/N0/IB6 PCI_I/O_Boa connected configured ok powered-on, assigned
/N0/IB8 PCI_I/O_Boa connected configured ok powered-on, assigned
========================= Available Boards/Slots for Domain ===========================
Board Receptacle Occupant
FRU Name Type Status Status Condition Info
--------- ----------- ----------- ------------ --------- ----------------------------------------
/N0/SB0 unknown empty unconfigured unknown assigned
/N0/SB2 unknown empty unconfigured unknown assigned
========================= Hardware Failures ==================================
No Hardware failures found in System
========================= HW Revisions =======================================
System PROM revisions:
----------------------
OBP 5.20.5 02/07/07 13:51
IO ASIC revisions:
------------------
Port
FRU Name Model ID Status Version
----------- --------------- ---- ------ -------
/N0/IB6/P0 SUNW,schizo 24 ok 4
/N0/IB6/P1 SUNW,schizo 25 ok 4
/N0/IB8/P0 SUNW,schizo 28 ok 4
/N0/IB8/P1 SUNW,schizo 29 ok 4
/N0/IB6/P0 SUNW,sgsbbc 24 ok 2
/N0/IB8/P0 SUNW,sgsbbc 28 ok 2
ilinch
messages就看到这些
warning
Jan 22 17:30:38 EJMAIN1 genunix: [ID 408789 kern.warning] WARNING: ce0: fault detected external to device; service degraded
Jan 22 17:30:38 EJMAIN1 genunix: [ID 451854 kern.warning] WARNING: ce0: xcvr addr:0x00 - link down
Jan 22 17:30:38 EJMAIN1 in.routed[218]: [ID 238047 daemon.warning] interface ce0 to 172.16.0.130 turned off
Jan 22 17:30:38 EJMAIN1 genunix: [ID 408789 kern.notice] NOTICE: ce0: fault cleared external to device; service available
Jan 22 17:30:38 EJMAIN1 genunix: [ID 451854 kern.notice] NOTICE: ce0: xcvr addr:0x00 - link up 1000 Mbps full duplex
Jan 22 17:30:38 EJMAIN1 in.routed[218]: [ID 300549 daemon.warning] interface ce0 to 172.16.0.130 restored
Jan 22 17:30:53 EJMAIN1 genunix: [ID 408789 kern.warning] WARNING: ce3: fault detected external to device; service degraded
Jan 22 17:30:53 EJMAIN1 genunix: [ID 451854 kern.warning] WARNING: ce3: xcvr addr:0x00 - link down
Jan 22 17:30:53 EJMAIN1 in.routed[218]: [ID 238047 daemon.warning] interface ce3 to 172.16.1.2 turned off
Jan 22 17:30:53 EJMAIN1 genunix: [ID 408789 kern.notice] NOTICE: ce3: fault cleared external to device; service available
Jan 22 17:30:53 EJMAIN1 genunix: [ID 451854 kern.notice] NOTICE: ce3: xcvr addr:0x00 - link up 1000 Mbps full duplex
Jan 22 17:30:53 EJMAIN1 in.routed[218]: [ID 300549 daemon.warning] interface ce3 to 172.16.1.2 restored
Jan 22 17:30:54 EJMAIN1 genunix: [ID 408789 kern.warning] WARNING: ce3: fault detected external to device; service degraded
Jan 22 17:30:54 EJMAIN1 genunix: [ID 451854 kern.warning] WARNING: ce3: xcvr addr:0x00 - link down
Jan 22 17:30:54 EJMAIN1 in.routed[218]: [ID 238047 daemon.warning] interface ce3 to 172.16.1.2 turned off
Jan 22 17:30:54 EJMAIN1 genunix: [ID 408789 kern.notice] NOTICE: ce3: fault cleared external to device; service available
Jan 22 17:30:54 EJMAIN1 genunix: [ID 451854 kern.notice] NOTICE: ce3: xcvr addr:0x00 - link up 1000 Mbps full duplex
Jan 22 17:30:54 EJMAIN1 in.routed[218]: [ID 300549 daemon.warning] interface ce3 to 172.16.1.2 restored
Jan 22 17:30:55 EJMAIN1 genunix: [ID 408789 kern.warning] WARNING: ce3: fault detected external to device; service degraded
Jan 22 17:30:55 EJMAIN1 genunix: [ID 451854 kern.warning] WARNING: ce3: xcvr addr:0x00 - link down
Jan 22 17:30:55 EJMAIN1 in.routed[218]: [ID 238047 daemon.warning] interface ce3 to 172.16.1.2 turned off
Jan 22 17:30:55 EJMAIN1 genunix: [ID 408789 kern.notice] NOTICE: ce3: fault cleared external to device; service available
Jan 22 17:30:55 EJMAIN1 genunix: [ID 451854 kern.notice] NOTICE: ce3: xcvr addr:0x00 - link up 1000 Mbps full duplex
Jan 22 17:30:55 EJMAIN1 in.routed[218]: [ID 300549 daemon.warning] interface ce3 to 172.16.1.2 restored
Jan 22 17:33:32 EJMAIN1 genunix: [ID 408789 kern.warning] WARNING: ce0: fault detected external to device; service degraded
Jan 22 17:33:32 EJMAIN1 genunix: [ID 451854 kern.warning] WARNING: ce0: xcvr addr:0x00 - link down
Jan 22 17:33:32 EJMAIN1 in.routed[218]: [ID 238047 daemon.warning] interface ce0 to 172.16.0.130 turned off
Jan 22 17:33:32 EJMAIN1 genunix: [ID 408789 kern.warning] WARNING: ce3: fault detected external to device; service degraded
Jan 22 17:33:32 EJMAIN1 genunix: [ID 451854 kern.warning] WARNING: ce3: xcvr addr:0x00 - link down
Jan 22 17:33:32 EJMAIN1 in.routed[218]: [ID 238047 daemon.warning] interface ce3 to 172.16.1.2 turned off
ilinch
最后补问一下
prtdiag -v 怎看不到fan的状态,还有就是温度状态信息?
柯雅
主要是说系统板SB4的温度过高,超过预警值,可连接到主SC,运行如下命令查看具体状态:
sc> showboards -v
sc> showenvironment -v
tomboy
SB4板的温度太高了,报警。可能是前面的虑尘网的灰尘太多,通风不好造成的,我曾经遇到过相同的问题。清理一下虑尘网就好了.