Configure MPI for Distributed Computing

Configure MPI for Distributed Computing

This section will cover how to configure ADCME for MPI functionalities.

Configure the MPI backend

The first step is to configure your MPI backend. There are many choices depending on your operation system. For example, Windows have Microsoft MPI. There are also OpenMPI and Intel MPI available on most Linux distributions. If you want to use your own MPI backend, you need to locate the MPI libraries, header files, and executable (e.g., mpirun). You need to build ADCME with the following environment variable:

C:\\Program Files (x86)\\Microsoft SDKs\\MPI\\Lib\\x64\\msmpi.lib

On Unix systems, it may be /opt/ohpc/pub/compiler/intel-18/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/release/ Note that you must include the shared library in the variable.

C:\\Program Files (x86)\\Microsoft SDKs\\MPI\\Include Or in a Unix system, we have /opt/ohpc/pub/compiler/intel-18/compilers_and_libraries_2018.2.199/linux/mpi/intel64/include/

The simplest way is to add these variables in the environment variables. For example, in Linux, we can add the following lines in the ~/.bashrc file.

export MPI_C_LIBRARIES=/opt/ohpc/pub/compiler/intel-18/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/
export MPI_INCLUDE_PATH=/opt/ohpc/pub/compiler/intel-18/compilers_and_libraries_2018.2.199/linux/mpi/intel64/include/
alias mpirun=/opt/ohpc/pub/compiler/intel-18/compilers_and_libraries_2018.2.199/linux/mpi/intel64/bin/mpirun

In the case you do not have an MPI backend, ADCME provides you a convenient way to install MPI by compiling from source. Just run

using ADCME

This should install an OpenMPI library for you. Note this functionality does not work on Windows and is only tested on Linux.

Build MPI Libraries

The MPI functionality of ADCME is not fulfilled at this point. To enable the MPI support, you need to recompile the built-in custom operators.

using ADCME

At this point, you will be able to use MPI features.

Build MPI Custom Operators

You can also build MPI-enabled custom operators by calling

using ADCME

In this case, there will be extra lines in CMakeLists.txt to setup MPI dependencies.

  message("MPI_INCLUDE_PATH and/or MPI_C_LIBRARIES is not set. MPI operators are not compiled.")

Running MPI Applications with Slurm

To run MPI applications with slurm, the following commands are useful

sbatch -n 4 -c 8 mpirun -n 4 julia app.jl 

This specifies 4 tasks and each task uses 8 cores. You can also replace sbatch with salloc.

To diagonose the application, you can also let mpirun print out the rank information, e.g., in OpenMPI we have

sbatch -n 4 -c 8 mpirun --report-bindings -n 4 julia app.jl