txl829
请教RHCS的问题(REDHAT Enterprise 5)
我在我在其中一台机器(LINUX1)上起service cman start,在这之前两台机器的cman都是没起的
日志刷出以下内容:
Jun 23 18:49:13 LINUX1 openais[29836]: [CMAN ] CMAN 2.0.60 (built Jan 23 2007 12:42:29) started
Jun 23 18:49:13 LINUX1 openais[29836]: [SYNC ] Not using a virtual synchrony filter.
Jun 23 18:49:13 LINUX1 openais[29836]: [TOTEM] Creating commit token because I am the rep.
Jun 23 18:49:13 LINUX1 openais[29836]: [TOTEM] Saving state aru 0 high seq received 0
Jun 23 18:49:13 LINUX1 openais[29836]: [TOTEM] entering COMMIT state.
Jun 23 18:49:13 LINUX1 openais[29836]: [TOTEM] entering RECOVERY state.
Jun 23 18:49:13 LINUX1 openais[29836]: [TOTEM] position [0] member 192.168.10.31:
Jun 23 18:49:13 LINUX1 openais[29836]: [TOTEM] previous ring seq 0 rep 192.168.10.31
Jun 23 18:49:13 LINUX1 openais[29836]: [TOTEM] aru 0 high delivered 0 received flag 0
Jun 23 18:49:13 LINUX1 openais[29836]: [TOTEM] Did not need to originate any messages in recovery.
Jun 23 18:49:13 LINUX1 openais[29836]: [TOTEM] Storing new sequence id for ring 4
Jun 23 18:49:13 LINUX1 openais[29836]: [TOTEM] Sending initial ORF token
Jun 23 18:49:13 LINUX1 openais[29836]: [CLM ] CLM CONFIGURATION CHANGE
Jun 23 18:49:13 LINUX1 openais[29836]: [CLM ] New Configuration:
Jun 23 18:49:13 LINUX1 openais[29836]: [CLM ] Members Left:
Jun 23 18:49:13 LINUX1 openais[29836]: [CLM ] Members Joined:
Jun 23 18:49:13 LINUX1 openais[29836]: [SYNC ] This node is within the primary component and will provide service.
Jun 23 18:49:13 LINUX1 openais[29836]: [CLM ] CLM CONFIGURATION CHANGE
Jun 23 18:49:13 LINUX1 openais[29836]: [CLM ] New Configuration:
Jun 23 18:49:13 LINUX1 openais[29836]: [CLM ] r(0) ip(192.168.10.31)
Jun 23 18:49:13 LINUX1 openais[29836]: [CLM ] Members Left:
Jun 23 18:49:13 LINUX1 openais[29836]: [CLM ] Members Joined:
Jun 23 18:49:13 LINUX1 openais[29836]: [CLM ] r(0) ip(192.168.10.31)
Jun 23 18:49:13 LINUX1 openais[29836]: [SYNC ] This node is within the primary component and will provide service.
Jun 23 18:49:13 LINUX1 openais[29836]: [TOTEM] entering OPERATIONAL state.
Jun 23 18:49:13 LINUX1 openais[29836]: [CMAN ] quorum regained, resuming activity
Jun 23 18:49:13 LINUX1 openais[29836]: [CLM ] got nodejoin message 192.168.10.31
Jun 23 18:49:13 LINUX1 ccsd[29830]: Initial status:: Quorate
Jun 23 18:49:19 LINUX1 fenced[29854]: LINUX2 not a cluster member after 3 sec post_join_delay
Jun 23 18:49:19 LINUX1 fenced[29854]: fencing node "LINUX2"
Jun 23 18:49:19 LINUX1 fence_manual: Node LINUX2 needs to be reset before recovery can procede. Waiting for LINUX2 to rejoin the cluster or for manual acknowledgement that it has been reset (i.e. fence_ack_manual -n LINUX2)
然后我执行fence_ack_manual -n LINUX2
日志刷出:
Jun 23 18:50:43 LINUX1 fenced[29854]: fence "LINUX2" success
Jun 23 18:50:48 LINUX1 ccsd[29830]: Attempt to close an unopened CCS descriptor (180).
Jun 23 18:50:48 LINUX1 ccsd[29830]: Error while processing disconnect: Invalid request descriptor
我如果在另一台机器上起cman,也会刷出相同的日志,只是 LINUX2换成了LINUX1;
两台机器cman都起来后,两台机器的情况入下:
[root@LINUX1 cluster]# clustat -l
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
LINUX1 1 Online, Local
LINUX2 2 Offline
[root@LINUX2 etc]# clustat -l
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
LINUX1 1 Offline
LINUX2 2 Online, Local
请教各位高手,这是什么问题,如何解决?
[[i] 本帖最后由 txl829 于 2008-6-28 14:32 编辑 [/i]]
txl829
[root@LINUX1 cluster]# cat cluster.conf
<?xml version="1.0" ?>
<cluster config_version="2" name="_cluster">
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="LINUX1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="svr_ip" nodename="LINUX1"/>
</method>
</fence>
</clusternode>
<clusternode name="LINUX2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="svr_ip" nodename="LINUX2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_manual" name="svr_ip"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="" ordered="0" restricted="0">
<failoverdomainnode name="LINUX1" priority="1"/>
<failoverdomainnode name="LINUX2" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources/>
<service autostart="1" domain="" name="serv_ip" recovery="relocate">
<ip address="192.168.10.32" monitor_link="1"/>
</service>
</rm>
</cluster>
hosts
[root@LINUX1 etc]# cat hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
192.168.10.11 1
192.168.10.13 2
192.168.10.31 LINUX1
192.168.10.33 LINUX2
192.168.10.32 svr_ip
日志
Jun 23 22:24:42 LINUX2 ccsd[10879]: Starting ccsd 2.0.60:
Jun 23 22:24:42 LINUX2 ccsd[10879]: Built: Jan 23 2007 12:42:25
Jun 23 22:24:42 LINUX2 ccsd[10879]: Copyright (C) Red Hat, Inc. 2004 All rights reserved.
Jun 23 22:24:42 LINUX2 ccsd[10879]: cluster.conf (cluster name = _cluster, version = 2) found.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] AIS Executive Service RELEASE 'subrev 1324 version 0.80.2'
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Copyright (C) 2006 Red Hat, Inc.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] AIS Executive Service: started and ready to provide service.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Using default multicast address of 239.192.88.13
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] openais component openais_cpg loaded.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Registering service handler 'openais cluster closed process group service v1.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] openais component openais_cfg loaded.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Registering service handler 'openais configuration service'
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] openais component openais_msg loaded.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Registering service handler 'openais message service B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] openais component openais_lck loaded.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Registering service handler 'openais distributed locking service B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] openais component openais_evt loaded.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Registering service handler 'openais event service B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] openais component openais_ckpt loaded.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Registering service handler 'openais checkpoint service B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] openais component openais_amf loaded.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Registering service handler 'openais availability management framework B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] openais component openais_clm loaded.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Registering service handler 'openais cluster membership service B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] openais component openais_evs loaded.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Registering service handler 'openais extended virtual synchrony service'
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] openais component openais_cman loaded.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Registering service handler 'openais CMAN membership service 2.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] Token Timeout (10000 ms) retransmit timeout (495 ms)
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] token hold (386 ms) retransmits before loss (20 retrans)
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] join (60 ms) send_join (0 ms) consensus (4800 ms) merge (200 ms)
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] downcheck (1000 ms) fail to recv const (50 msgs)
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] seqno unchanged const (30 rotations) Maximum network MTU 1500
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] window size per rotation (50 messages) maximum messages per rotation (17 messages)
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] send threads (0 threads)
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] RRP token expired timeout (495 ms)
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] RRP token problem counter (2000 ms)
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] RRP threshold (10 problem count)
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] RRP mode set to none.
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] heartbeat_failures_allowed (0)
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] max_network_delay (50 ms)
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] Receive multicast socket recv buffer size (262142 bytes).
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] The network interface [192.168.10.33] is now up.
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] Created or loaded sequence id 0.192.168.10.33 for this ring.
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] entering GATHER state from 15.
Jun 23 22:24:45 LINUX2 openais[10885]: [SERV ] Initialising service handler 'openais extended virtual synchrony service'
Jun 23 22:24:45 LINUX2 openais[10885]: [SERV ] Initialising service handler 'openais cluster membership service B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [SERV ] Initialising service handler 'openais availability management framework B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [SERV ] Initialising service handler 'openais checkpoint service B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [SERV ] Initialising service handler 'openais event service B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [SERV ] Initialising service handler 'openais distributed locking service B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [SERV ] Initialising service handler 'openais message service B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [SERV ] Initialising service handler 'openais configuration service'
Jun 23 22:24:45 LINUX2 openais[10885]: [SERV ] Initialising service handler 'openais cluster closed process group service v1.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [SERV ] Initialising service handler 'openais CMAN membership service 2.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [CMAN ] CMAN 2.0.60 (built Jan 23 2007 12:42:29) started
Jun 23 22:24:45 LINUX2 openais[10885]: [SYNC ] Not using a virtual synchrony filter.
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] Creating commit token because I am the rep.
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] Saving state aru 0 high seq received 0
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] entering COMMIT state.
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] entering RECOVERY state.
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] position [0] member 192.168.10.33:
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] previous ring seq 0 rep 192.168.10.33
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] aru 0 high delivered 0 received flag 0
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] Did not need to originate any messages in recovery.
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] Storing new sequence id for ring 4
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] Sending initial ORF token
Jun 23 22:24:45 LINUX2 openais[10885]: [CLM ] CLM CONFIGURATION CHANGE
Jun 23 22:24:45 LINUX2 openais[10885]: [CLM ] New Configuration:
Jun 23 22:24:45 LINUX2 openais[10885]: [CLM ] Members Left:
Jun 23 22:24:45 LINUX2 openais[10885]: [CLM ] Members Joined:
Jun 23 22:24:45 LINUX2 openais[10885]: [SYNC ] This node is within the primary component and will provide service.
Jun 23 22:24:45 LINUX2 openais[10885]: [CLM ] CLM CONFIGURATION CHANGE
Jun 23 22:24:45 LINUX2 openais[10885]: [CLM ] New Configuration:
Jun 23 22:24:45 LINUX2 openais[10885]: [CLM ] r(0) ip(192.168.10.33)
Jun 23 22:24:45 LINUX2 openais[10885]: [CLM ] Members Left:
Jun 23 22:24:45 LINUX2 openais[10885]: [CLM ] Members Joined:
Jun 23 22:24:45 LINUX2 openais[10885]: [CLM ] r(0) ip(192.168.10.33)
Jun 23 22:24:45 LINUX2 openais[10885]: [SYNC ] This node is within the primary component and will provide service.
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] entering OPERATIONAL state.
Jun 23 22:24:45 LINUX2 openais[10885]: [CMAN ] quorum regained, resuming activity
Jun 23 22:24:45 LINUX2 openais[10885]: [CLM ] got nodejoin message 192.168.10.33
Jun 23 22:24:45 LINUX2 ccsd[10879]: Initial status:: Quorate
Jun 23 22:24:50 LINUX2 fenced[10901]: LINUX1 not a cluster member after 3 sec post_join_delay
Jun 23 22:24:50 LINUX2 fenced[10901]: fencing node "LINUX1"
Jun 23 22:24:50 LINUX2 fence_manual: Node LINUX1 needs to be reset before recovery can procede. Waiting for LINUX1 to rejoin the cluster or for manual acknowledgement that it has been reset (i.e. fence_ack_manual -n LINUX1)
Jun 23 22:25:42 LINUX2 fenced[10901]: fence "LINUX1" success
Jun 23 22:25:47 LINUX2 ccsd[10879]: Attempt to close an unopened CCS descriptor (180).
Jun 23 22:25:47 LINUX2 ccsd[10879]: Error while processing disconnect: Invalid request descriptor
[[i] 本帖最后由 txl829 于 2008-6-28 14:33 编辑 [/i]]
txl829
回复 #1 txl829 的帖子
1.我是用system-config-cluster配置的;
2.两台机器的10.31,10.33的地址是通的,相互能ping通,两台机器的防火墙都已禁用;
3.我见clustat -l看到的信息不对,因此没有起rgmanager;
4.操作系统的内核应该不是xen,我用redhat的的光盘装的,没动过内核;
5.后来我用conga重新配了一遍,还是同样的问题;
6.我还遇到过这样的情况,两台机器用clustat -l看,两个节点都是online,但Local会分别的自已的节点上,也就是讲在node1上看,local就在node1,在node2上看,local就在node2。
jerrywjl
[quote]6.我还遇到过这样的情况,两台机器用clustat -l看,两个节点都是online,但Local会分别的自已的节点上,也就是讲在node1上看,local就在node1,在node2上看,local就在node2。[/quote]
这是正常的。至于你说的上面其他的情况,按照你讲的应该没有问题。