Parallel Summation using MPI in Python with mpi4py

3 min readMay 7, 2024

Parallel summation involves distributing the task of summing a large set of numbers across multiple processors or computing nodes, enabling simultaneous computation and aggregation of partial results. Each processor handles a portion of the data, performs local summation, and then communicates its partial sum to a designated root processor. The root processor collects and combines these partial sums to compute the global sum, thereby leveraging parallelism to accelerate the computation process and efficiently handle large-scale data sets. In this tutorial, we explore the implementation of parallel summation using MPI in Python, demonstrating how MPI facilitates communication and coordination among processes to achieve efficient parallel computation. Visit the detailed tutorial on MPI in Python here.

Code Example

from mpi4py import MPI

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()# Define the total number of elements
N = 100# Calculate the number of elements to be handled by each process
chunk_size = N / size
start = rank * chunk_size
end = start + chunk_size# Perform local summation for each process
local_sum = sum(range(start + 1, end + 1))# Gather all local sums to the root process (rank 0)
global_sum = comm.reduce(local_sum, op=MPI.SUM, root=0)# Print the result at the root process
if rank == 0:
    print("Global sum:", global_sum)

Explanation

Import MPI Module and Initialize MPI Environment

from mpi4py import MPI

This line imports the MPI module from the mpi4py package, enabling the use of MPI functionalities.

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

These lines initialize the MPI environment. MPI.COMM_WORLD creates a communicator object representing all processes in the MPI world. comm.Get_size() returns the number of processes in the communicator, and comm.Get_rank() returns the rank of the current process in the communicator.

Define the Total Number of Elements

N = 100

This line defines the total number of elements to be processed.

Calculate Chunk Size and Local Range

chunk_size = N / size
start = rank * chunk_size
end = start + chunk_size

These lines calculate the number of elements to be handled by each process (chunk_size) and determine the start and end indices of the local range of elements to be processed by the current process.

Perform Local Summation

local_sum = sum(range(start + 1, end + 1))

This line performs the local summation of elements for each process. It creates a range of numbers starting from start + 1 (since we want to include start) to end, and then sums up these numbers using the sum function.

Gather Local Sums to Root Process

global_sum = comm.reduce(local_sum, op=MPI.SUM, root=0)

This line gathers all the local sums from each process and performs a reduction operation. The local sums are reduced using the MPI sum operation (MPI.SUM). The resulting global sum is stored in the variable global_sum and is only computed by the root process (rank 0).

Print Result at Root Process

if rank == 0:
    print("Global sum:", global_sum)

This conditional statement ensures that only the root process (rank 0) prints the final result. Other processes contribute their local sums but do not print the result.

Parallel Summation using MPI in Python with mpi4py

Code Example

Explanation

Written by Afzal Badshah, PhD