A matrix of size
is striped row-wise amoung p processes, so that each
processor stores n/p rows of the matrix. A Vector of size n is sent to each processor.
Each processor multiplies it's chunk of the matrix with the corresponding rows of the vector.
The root process then gathers in all the values. This is more efficient than the previous
solution, as the entire matrix is broadcast to the nodes at the start, instead of looping around
and sending one row at a time.