Connecting MapReduce Computations to Realistic Machine Models
We explain how the popular, highly abstract MapReduce model of parallel computation (MRC) can be rooted in reality by explaining how it can be simulated on realistic distributed-memory parallel machine models like BSP. We first refine the model (MRC^+) to include parameters for total work w, bottleneck work ŵ, data volume m, and maximum object sizes m̂. We then show matching upper and lower bounds for executing a MapReduce calculation on the distributed-memory machine –Θ(w/p+ŵ+log p) work and Θ(m/p+m̂+log p) bottleneck communication volume using p processors.
READ FULL TEXT