# Parallel Summation using MPI in Python with mpi4py

Parallel summation involves distributing the task of summing a large set of numbers across multiple processors or computing nodes, enabling simultaneous computation and aggregation of partial results. Each processor handles a portion of the data, performs local summation, and then communicates its partial sum to a designated root processor. The root processor collects and combines these partial sums to compute the global sum, thereby leveraging parallelism to accelerate the computation process and efficiently handle large-scale data sets. In this tutorial, we explore the implementation of parallel summation using MPI in Python, demonstrating how MPI facilitates communication and coordination among processes to achieve efficient parallel computation. Visit the detailed tutorial on MPI in Python here.

# Code Example

`from mpi4py import MPI`

comm = MPI.COMM_WORLD

size = comm.Get_size()

rank = comm.Get_rank()# Define the total number of elements

N = 100# Calculate the number of elements to be handled by each process

chunk_size = N / size

start = rank * chunk_size

end = start + chunk_size# Perform local summation for each process

local_sum = sum(range(start + 1, end + 1))# Gather all local sums to the root process (rank 0)

global_sum = comm.reduce(local_sum, op=MPI.SUM, root=0)# Print the result at the root process

if rank == 0:

print("Global sum:", global_sum)

# Explanation

**Import MPI Module and Initialize MPI Environment**

`from mpi4py import MPI`

This line imports the MPI module from the mpi4py package, enabling the use of MPI functionalities.

`comm = MPI.COMM_WORLD`

size = comm.Get_size()

rank = comm.Get_rank()

These lines initialize the MPI environment. `MPI.COMM_WORLD`

creates a communicator object representing all processes in the MPI world. `comm.Get_size()`

returns the number of processes in the communicator, and `comm.Get_rank()`

returns the rank of the current process in the communicator.

**Define the Total Number of Elements**

`N = 100`

This line defines the total number of elements to be processed.

**Calculate Chunk Size and Local Range**

`chunk_size = N / size`

start = rank * chunk_size

end = start + chunk_size

These lines calculate the number of elements to be handled by each process (`chunk_size`

) and determine the start and end indices of the local range of elements to be processed by the current process.

**Perform Local Summation**

`local_sum = sum(range(start + 1, end + 1))`

This line performs the local summation of elements for each process. It creates a range of numbers starting from `start + 1`

(since we want to include `start`

) to `end`

, and then sums up these numbers using the `sum`

function.

**Gather Local Sums to Root Process**

`global_sum = comm.reduce(local_sum, op=MPI.SUM, root=0)`

This line gathers all the local sums from each process and performs a reduction operation. The local sums are reduced using the MPI sum operation (`MPI.SUM`

). The resulting global sum is stored in the variable `global_sum`

and is only computed by the root process (rank 0).

**Print Result at Root Process**

`if rank == 0:`

print("Global sum:", global_sum)

This conditional statement ensures that only the root process (rank 0) prints the final result. Other processes contribute their local sums but do not print the result.