The code is below and basically times how long it takes to do a bunch of useless work on each processor. I then sorts the times and prints the fastest and slowest results so you can compare. Finally it tells you which node the slowest processor is on.
FYI, a node is basically a small computer that is part of the cluster. On the node are usually multiple processors.
Ok, here is the code:
PROGRAM main
use mpi
implicit none
!include 'mpif.h'
integer :: ierr, n, i, myrank, nproc, temp_i, minIndex, slowest
real*8 :: x, start_time, end_time, temp_r, mytime
real*8, dimension(:), allocatable :: time
integer, dimension(:), allocatable :: rank
character(len=MPI_MAX_PROCESSOR_NAME) :: myname
! Initialize MPI environment
call MPI_INIT(ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, nproc, ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, myrank, ierr)
call MPI_GET_PROCESSOR_NAME(myname,n,ierr)
if (myrank.eq.0) print *,'Starting test'
if (myrank.eq.0) print *,' results accurate to ',MPI_WTICK()
allocate( time(nproc)); time=real(0,8)
allocate( rank(nproc));
do n = 1, 10
! Start timing
call MPI_BARRIER(MPI_COMM_WORLD,ierr)
start_time = MPI_Wtime()
! Do a bunch of useless work
x=real(0,8)
do i = 1, 100000000
if (mod(i,2)>0) then
x = x + real(i,8)
else
x = x - real(i,8)
end if
end do
! Stop timing
end_time = MPI_Wtime()
! Collect times
mytime = end_time - start_time
call MPI_GATHER(mytime,1,MPI_REAL8, &
time,1,MPI_REAL8,0,MPI_COMM_WORLD,ierr)
! Find slowest processors
if (myrank.eq.0) then
! Form rank array
do i = 1, nproc
rank(i)=i-1
end do
! Selection sort
do i = 1, nproc-1
minIndex = minloc(time(i:), 1) + i - 1
if (time(i) > time(minIndex)) then
temp_r = time(i)
temp_i = rank(i)
time(i) = time(minIndex)
rank(i) = rank(minIndex)
time(minIndex) = temp_r
rank(minIndex) = temp_i
end if
end do
! Print fastest 5 and slowest 5 processors rank and time
print *,''
write(*,'(A,5I10.1,A,5I10.1)') &
'Rank (fastest-slowest)',rank(1:5),' ...',rank(nproc-4:nproc)
write(*,'(A,5F10.6,A,5F10.6)') &
'Time (fastest-slowest)',time(1:5),' ...',time(nproc-4:nproc)
end if
! Print name of slowest processor
slowest=rank(nproc)
call MPI_BCAST(slowest,1,MPI_INTEGER,0,MPI_COMM_WORLD,ierr)
if (myrank.eq.slowest) print *, &
'The slowest processor is on ',trim(myname)
end do
call MPI_BARRIER(MPI_COMM_WORLD,ierr)
if (myrank.eq.0) print *,'Test complete'
! Close parallel environment
call MPI_FINALIZE(ierr)
END PROGRAM main
Save the code to node-test.f90. Compile the code using a command like "mpif90 -o node-test node-test.f90" and then run the code on your cluster using your usual approach.
The output shows the 5 fastest and 5 slowest processor ranks and the time it took each of them to complete the test. The test is repeated 10 time to see if one processor is consistently the slowest.
If you use the code and find issues or improvements please post a comment below.