两个关于mpich的基本问题

wxqdelphi
两个关于mpich的基本问题

1.需要将可执行程序拷贝到“所有”节点机上吗?
2.一定要创建/etc/hosts.equiv文件,使各个机器之间相互识别吗?

troyme
回复 #1 wxqdelphi 的帖子

1,不是,通过网络共享也可以
2,不需要,ssh通过key认证,  rsh通过  .rhosts   认证

zarcoder_neu
可以把需要用到的库在的文件夹export给各个节点
例如我把
/usr/local
/opt
/home
export给各计算节点
各种软件安装在opt目录下
用户在home目录下提交任务进行计算


mpich
我使用xcat安装的时候直接makempich
楼上的说的方法肯定行,我也那样做过

wxqdelphi
谢谢大家的建议!
我现在两台工作站,可以ssh无密码登录。
但为什么运行很简单的一个小程序,就会报以下错误:child process exited while making connection to remote process on?

这个小程序在单台工作站上运行没问题。

网上有以下两段建议:
i think the message you got maybe arised by your rsh ,rexec or rlogin command not being set correctly.so...you can execute "rsh {node} {commond}" to examine the rsh command work or not...
i solved the same problem by this method.you can try it .no problem.
-----------------------------------------
you reported:

util/tstmachines Errors while trying to run ssh client1.mydomain.com -n true Unexpected response from client1.mydomain.com:
--> Warning: No xauth data; using fake authentication
data for X11 forwarding.

check if ssh is working, for example,
" % ssh slavehost ls "

Check if passwordless solution is real working.

If you will have problems in the second step of the tstmachines, check if you share folders between the hosts.

I saw in the net the use of mpiCC to compile prevents these following errors that you reported.

p0_4293: p4_error: Child process exited while making connection to remote process on client1.mydomain.com:
0
p0_4293: (10.253768) net_send: could not write to fd=4, errno = 32 [root@master basic]#

it worked here.

Sorry, but i have low experience until now in MPI.
i wish that i could help you.
-----------------------------------------
看起来很复杂,会不会是机器的问题?
我用的是dell 690工作站,每台工作站4颗cpu。(两个双核cpu)。
好像两个双核cpu并行会出问题?

troyme
你确定在单台机子上用mpich多进程跑你的程序没有问题么


保证两台机子上你的程序的路径保持一致。

wxqdelphi
谢谢!
我再确认一下!