We have assumed that LSB_MCPU_HOSTS contains a list of hostname cores pairs as follows:
$ echo $LSB_MCPU_HOSTS
host1 7 host2 7 host3 7 host4 7
And the number of cores of each host is computed as follows with some trick to allow duplication of host names.
|
echo "Num cpus per host is:" $LSB_MCPU_HOSTS |
|
IFS=' ' read -r -a array <<< "$LSB_MCPU_HOSTS" |
|
declare -A associative |
|
i=0 |
|
len=${#array[@]} |
|
while [ $i -lt $len ] |
|
do |
|
key=${array[$i]} |
|
value=${array[$i+1]} |
|
associative[$key]+=$value |
|
i=$((i=i+2)) |
|
done |
The problem is that LSB_MCPU_HOSTS is actually a list of hostname slots pairs as described in Running parallel jobs on specific hosts. slot may contain multiple cores. Thus the calculation above may produce wrong numbers.
Here is an example.
# job submitted by: bsub -n 4 -R "affinity[core(7,same=socket)]" -gpu num=1/task
$ echo $LSB_MCPU_HOSTS
host1 1 host2 3
$ cat $LSB_AFFINITY_HOSTFILE
host1 1,2,3,4,5,6,7
host2 0,2,3,4,6,7,8
host2 19,21,22,23,24,26,27
host2 28,29,37,41,48,49,50
I have requested a job consisting of 4 slots. Each slot has 7 cores and 1 GPU. As a result, 1 slot is allocated on host1 and 3 slots are allocated on host2 as described by LSB_MCPU_HOSTS variable.
The file specified by LSB_MCPU_HOSTS contains a list of slots and core allocation for each slot. Each line of the files is of the form of hostname core-list. core-list is comma separated list of core IDs.
So possible solution is to count up core IDs for each host from $LSB_AFFINITY_HOSTFILE file.
We have assumed that LSB_MCPU_HOSTS contains a list of
hostname corespairs as follows:And the number of cores of each host is computed as follows with some trick to allow duplication of host names.
ray-integration/ray_launch_cluster.sh
Lines 64 to 75 in c63630a
The problem is that LSB_MCPU_HOSTS is actually a list of
hostname slotspairs as described in Running parallel jobs on specific hosts.slotmay contain multiple cores. Thus the calculation above may produce wrong numbers.Here is an example.
I have requested a job consisting of 4 slots. Each slot has 7 cores and 1 GPU. As a result, 1 slot is allocated on host1 and 3 slots are allocated on host2 as described by LSB_MCPU_HOSTS variable.
The file specified by
LSB_MCPU_HOSTScontains a list of slots and core allocation for each slot. Each line of the files is of the form ofhostname core-list.core-listis comma separated list of core IDs.So possible solution is to count up core IDs for each host from $LSB_AFFINITY_HOSTFILE file.