|
|
|
|
Theodore Sung
Post Count: 7
|
4/21/2008 11:23 am
Thanks for the previous help with MPI. I'm now trying another experiment where I want to run my program as a worker on the SiCortex machine and have the master reside on my Windows box. Since SLURM
is responsible for managing the MPI processes, how would I go about doing something like that? On a Windows setup, it would be something like this:
mpiexec -n 1 \\teds\d_drive\mywork\cmochk\cmochk_alpha.exe -zcheck -mpi ... -path \\juliet\s_drive\cmo_cdi -cdu_path
: -n 4 -host blade2 -map s:\\juliet\s_drive \\teds\d_drive\mywork\cmochk\cmochk_alpha.exe -zcheck ...
|
|
Lawrence Stewart
SiCortex software group
Post Count: 2
|
4/21/2008 2:04 pm
There would seem to be two issues to figure out:
* How to launch a heterogenous job, with one (the master) on your windows system, and others (the slaves) on the SC system.
* How to communicate using MPI in this heterogeneous environment, with some jobs on SC and one on windows.
Do I have this right? The issues I see are that while we ship slurm configured to launch and manage jobs on the SC, it knows nothing about windows or indeed any nodes "outside the cabinet". Second, the SiCortex MPI library only communicates over the SC fabric hardware, it doesn't know how to communicate with any nodes "outside the cabinet" and doesn't even use TCP/IP, for example.
Regarding the job control, I suspect there are two approaches: get slurm to work across SC AND your windows system OR run the master manually and use slurm only to launch the slaves. We've never tried extending slurm to run heterogenous jobs, and know little about it. From my web reading, I think it could be made to work, but I don't really know how to do it. Running the master manually is easy, provided the communications can be made to work.
Does the master communicate with the slaves via MPI? If so, I think you will need an implementation of MPI that supports heterogeneous systems, such as OpenMPI. We haven't ported that to SiCortex, but there are rumors that other folks are working on it.
If the master communicates with the slaves merely by TCP/IP, or if it's MPI operations could be remoted to a proxy master running on an SC node, then the SiCortex MPI should be OK to manage communications among ranks inside the SC box.
|