HACMP的问题

yinxiuping
HACMP的问题

errpt的结果
1BA7DF4E   0618211806 P S SRC            SOFTWARE PROGRAM ERROR
BA431EB7   0618211806 P S SRC            SOFTWARE PROGRAM ERROR
BA431EB7   0618211806 P S SRC            SOFTWARE PROGRAM ERROR

[H50-2][root][/usr/sbin/cluster]>errpt -aj 1BA7DF4E
---------------------------------------------------------------------------
LABEL:          SRC_TRYX
IDENTIFIER:     1BA7DF4E

Date/Time:       Sun Jun 18 21:18:44 BEIS
Sequence Number: 4933
Machine Id:      000055034C00
Node Id:         H50-2
Class:           S
Type:            PERM
Resource Name:   SRC            

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        DETERMINE WHY SUBSYSTEM CANNOT RESTART

Detail Data
SYMPTOM CODE
         256
SOFTWARE ERROR CODE
       -9020
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'343'
FAILING MODULE
clsmuxpdES

[H50-2][root][/usr/sbin/cluster]>errpt -aj BA431EB7
---------------------------------------------------------------------------
LABEL:          SRC_RSTRT
IDENTIFIER:     BA431EB7

Date/Time:       Sun Jun 18 21:18:43 BEIS
Sequence Number: 4932
Machine Id:      000055034C00
Node Id:         H50-2
Class:           S
Type:            PERM
Resource Name:   SRC            

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        VERIFY SUBSYSTEM RESTARTED AUTOMATICALLY

Detail Data
SYMPTOM CODE
         256
SOFTWARE ERROR CODE
       -9035
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'217'
FAILING MODULE
clsmuxpdES
---------------------------------------------------------------------------
LABEL:          SRC_RSTRT
IDENTIFIER:     BA431EB7

Date/Time:       Sun Jun 18 21:18:43 BEIS
Sequence Number: 4931
Machine Id:      000055034C00
Node Id:         H50-2
Class:           S
Type:            PERM
Resource Name:   SRC            

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        VERIFY SUBSYSTEM RESTARTED AUTOMATICALLY

Detail Data
SYMPTOM CODE
         256
SOFTWARE ERROR CODE
       -9035
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'217'
FAILING MODULE
clsmuxpdES

[H50-2][root][/usr/sbin/cluster]>lssrc -g cluster
Subsystem         Group            PID          Status
clstrmgrES       cluster          22250        active

[H50-2][root][/usr/sbin/cluster]>more /usr/es/adm/cluster.log
Jun 18 06:19:03 H50-2 RMCdaemon[8940]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6eKora0Lz5Z2/JlI/422e.1...................:::R
eference ID:  :::Template ID: a6df45aa:::Details File:  :::Location: RSCT,rmcd.c,1.37,202                          :::RMCD_INFO_0_ST
The daemon is started.
Jun 18 21:01:50 H50-2 syslog: 0821-285 ioctl returns 70
Jun 18 21:02:05 H50-2 syslog: 0821-285 ioctl returns 70
Jun 18 21:16:25 H50-2 topsvcs[22714]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6UpNEL0d6JZ2/e3v0422e.1...................:::Re
ference ID:  :::Template ID: 97419d60:::Details File:  :::Location: rsct,bootstrp.C,1.176,4010                    :::TS_START_ST Top
ology Services daemon started Topology Services daemon started by: SRC Topology Services daemon log file location /var/ha/log/topsvc
s.18.211625.H50_cluster.en_/var/ha/run/topsvcs.H50_cluster/ Topology Services daemon run directory /var/ha/run/topsvcs.H50_cluster/
Jun 18 21:16:28 H50-2 grpsvcs[21492]: (Recorded using libct_ffdc.a cv 2):::Error ID: 63Y7ej0g6JZ2/je61422e.1...................:::Re
ference ID:  :::Template ID: afa89905:::Details File:  :::Location: RSCT,pgsd.C,1.51,541                          :::GS_START_ST Gro
up Services daemon started DIAGNOSTIC EXPLANATION HAGS daemon started by SRC. Log file is /var/ha/log/grpsvcs_2_12.H50_cluster.
Jun 18 21:16:40 H50-2 clstrmgrES[22250]: Sun Jun 18 21:16:40 HACMP/ES Cluster Manager Started
Jun 18 21:17:44 H50-2 HACMP for AIX: EVENT START: node_up H50_2
Jun 18 21:17:52 H50-2 HACMP for AIX: EVENT START: acquire_service_addr
Jun 18 21:17:57 H50-2 HACMP for AIX: EVENT START: acquire_aconn_service en0 net_ether_01
Jun 18 21:17:57 H50-2 HACMP for AIX: EVENT COMPLETED: acquire_aconn_service en0 net_ether_01
Jun 18 21:17:58 H50-2 HACMP for AIX: EVENT COMPLETED: acquire_service_addr
Jun 18 21:18:43 H50-2 clsmuxpdES[23012]: 3 clsmuxpd 23012 (root    )  smuxp_doit: SMUX registration of 1.3.6.1.4.1.2.3.1.2.1.5 faile
d
Jun 18 21:18:43 H50-2 clsmuxpdES[23014]: 3 clsmuxpd 23014 (root    )  smuxp_doit: SMUX registration of 1.3.6.1.4.1.2.3.1.2.1.5 faile
d
Jun 18 21:18:44 H50-2 clsmuxpdES[23016]: 4 clsmuxpd 23016 (root    )  smuxp_doit: SMUX registration of 1.3.6.1.4.1.2.3.1.2.1.5 faile
d
Jun 18 21:18:44 H50-2 HACMP for AIX: clexit.rc : Unexpected termination of clsmuxpdES.
Jun 18 21:18:57 H50-2 HACMP for AIX: EVENT COMPLETED: node_up H50_2
Jun 18 21:18:59 H50-2 HACMP for AIX: EVENT START: node_up_complete H50_2
Jun 18 21:19:01 H50-2 HACMP for AIX: EVENT START: start_server H50_2_app
Jun 18 21:19:02 H50-2 syslog: entry not in table or multiple matches
Jun 18 21:19:03 H50-2 HACMP for AIX: EVENT COMPLETED: start_server H50_2_app
Jun 18 21:19:06 H50-2 HACMP for AIX: EVENT COMPLETED: node_up_complete H50_2
Jun 18 21:19:09 H50-2 HACMP for AIX: EVENT START: fail_interface H50_2 192.168.64.6
Jun 18 21:19:10 H50-2 HACMP for AIX: EVENT COMPLETED: fail_interface H50_2 192.168.64.6

[H50-2][root][/usr/sbin/cluster]>./clstat


                clstat - HACMP Cluster Status Monitor
                -------------------------------------


                THERE ARE NO CLUSTERS CURRENTLY ACTIVE



                THE PROGRAM WILL CONTINUE SEARCHING FOR ONE


以上是我能得到的一些信息,请大侠帮忙看看到底是什么问题?还有clsmuxpd到底是干吗的? 谢谢:)

framerelay
是不是standby 的网卡失败了??

打个补丁试试

yinxiuping
[H50-2][root][/usr/sbin/cluster]>netstat -i
Name  Mtu   Network     Address            Ipkts Ierrs    Opkts Oerrs  Coll
en0   1500  link#2      0.6.29.dc.95.e9     150401     0   127455     0     0
en0   1500  192.168.65  H50_2_boot          150401     0   127455     0     0
en0   1500  10.10.65    H50_2_svc           150401     0   127455     0     0
en1   1500  link#3      0.4.ac.49.7c.f8          0     0    38593     0     0
en1   1500  192.168.64  H50_2_stdby              0     0    38593     0     0
lo0   16896 link#1                          100138     0   113191     0     0
lo0   16896 127         loopback            100138     0   113191     0     0
lo0   16896 ::1                             100138     0   113191     0     0

这个应该没坏吧表示

framerelay
ifconfig -a 看看

应该不是坏, 可能是bug, 你打补丁试试

yinxiuping
[H50-2][root][/usr/sbin/cluster]>netstat -i
Name  Mtu   Network     Address            Ipkts Ierrs    Opkts Oerrs  Coll
en0   1500  link#2      0.6.29.dc.95.e9     150401     0   127455     0     0
en0   1500  192.168.65  H50_2_boot          150401     0   127455     0     0
en0   1500  10.10.65    H50_2_svc           150401     0   127455     0     0
en1   1500  link#3      0.4.ac.49.7c.f8          0     0    38593     0     0
en1   1500  192.168.64  H50_2_stdby              0     0    38593     0     0
lo0   16896 link#1                          100138     0   113191     0     0
lo0   16896 127         loopback            100138     0   113191     0     0
lo0   16896 ::1                             100138     0   113191     0     0

这个应该没坏吧表示

yinxiuping
[H50-2][root][/usr/sbin/cluster]>netstat -i
Name  Mtu   Network     Address            Ipkts Ierrs    Opkts Oerrs  Coll
en0   1500  link#2      0.6.29.dc.95.e9     150401     0   127455     0     0
en0   1500  192.168.65  H50_2_boot          150401     0   127455     0     0
en0   1500  10.10.65    H50_2_svc           150401     0   127455     0     0
en1   1500  link#3      0.4.ac.49.7c.f8          0     0    38593     0     0
en1   1500  192.168.64  H50_2_stdby              0     0    38593     0     0
lo0   16896 link#1                          100138     0   113191     0     0
lo0   16896 127         loopback            100138     0   113191     0     0
lo0   16896 ::1                             100138     0   113191     0     0

这个应该没坏吧表示

mxin
stdby应该是有问题的,Ipkts竟然为0。你ping另一台机器的stdy看看

[[i] 本帖最后由 mxin 于 2006-6-19 16:04 编辑 [/i]]

yixianq
这个问题我也遇到了,同样的报错,同样的./clstat 监控不到HA的状态。ibm的官方文档说是AIX5.2上装HACMP有SNMP的代理版本不匹配的问题,是个BUG(见下),在所有节点上都停HA的情况下,将操作系统的SNMP的代理版本由3降为1了,可是问题并没有解决。手动激活clsmuxpdES,提示你已经激活,可是lssrc -g cluster 去看的时候,clsmuxpdES并没有被激活。而且在日志里报了跟楼主一样的错误。这个问题有谁有比较完整的解决方法.小弟跪谢!

IY37779: DOC: AIX 5.2 SNMP CFG CHANGE NEEDED FOR CLSTAT, CLINFO AND CSPOC

APAR status
Closed as documentation error.

Error description
HACMP C-SPOC cluster start and stop, as well as the
CLINFO utility and CLSTAT require SNMP Version 1 agents.
These utilites will not work with the default AIX 5.2
configuration.

clstat fails with
"THERE ARE NO CLUSTERS CURRENTLY ACTIVE - THE PROGRAM WILL
CONTINUE SEARCHING FOR ONE"
Local fix
Problem summary
HACMP C-SPOC cluster start and stop, as well as the
CLINFO utility and CLSTAT require SNMP Version 1 agents.
These utilites will not work with the default AIX 5.2
configuration.

This APAR is being used to document this requirement.
Problem conclusion
The following information will be added to a future
version of the HAMCP PTF README File.

=======================
SNMP Issue with AIX 5.2
=======================

AIX version 5.2 defaults to using SNMP version 3 agents,
where HACMP uses SNMP version 1 agents. Since HACMP uses
SNMP for C-SPOC cluster start and stop, as well as the
CLINFO utility, these features will not work under the
AIX 5.2 default configuration.

AIX 5.2 provides a utility to change which SNMP agent it
uses. By executing the following command, you can change the
SNMP agent used to version 1. This restores compatibility
with HACMP's use of SNMP.

/usr/sbin/snmpv3_ssw -1

Note that the command line parameter is a numeral one.
Temporary fix
Comments

yixianq
如果楼主的问题解决了,请告知方法。谢谢!

redliquid
5.1的snmp在OS5.2上要用1版本