MPI Gather Function in Python
The gather function is used to gather data from multiple processes into a single process. We’ll go through the provided code, line by line, and understand how the gather function works. The detailed tutorial can be found here.
Code
from mpi4py import MPI
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
data = (rank + 1)
data = comm.gather(data, root=0)
print(f"Process {rank}: Calculated data = {data}")if rank == 0:
for i in range(size):
assert data[i] == (i + 1)
print(f"Process 0: Checked data received from process {i + 1}")
else:
assert data is None
Explanation
from mpi4py import MPI
This line imports the MPI functionality from the mpi4py library.
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
These lines initialize the MPI communicator (comm
) and obtain the total number of processes (size
) and the rank of the current process (rank
).
data = (rank + 1)
data = comm.gather(data, root=0)
Each process calculates its own data
value based on its rank (rank + 1
). Then, the gather
function is called on the communicator comm
. This function gathers data from all processes and returns it to the root process (rank 0) as a list. In this case, the data from each process is gathered at process 0 (root=0
).
print(f"Process {rank}: Calculated data = {data}")
Each process prints its own calculated data value. Since data
is different for each process, the output will vary accordingly.
if rank == 0:
for i in range(size):
assert data[i] == (i + 1)
print(f"Process 0: Checked data received from process {i + 1}")
In the root process (rank 0), a loop iterates over all processes (size
). It checks whether the received data from each process matches the expected value ((i + 1)
). If the data matches, it prints a confirmation message.
else:
assert data is None
In non-root processes (ranks other than 0), it’s ensured that the data
variable is None
since they do not receive any data through the gather
operation.
In this tutorial, we’ve learned how to use the gather
function in MPI to collect data from multiple processes into a single process. This is a fundamental operation in parallel programming, often used to gather results from worker processes to a master process for further processing or analysis. Understanding MPI collective operations like gather
is essential for developing efficient parallel programs.