Masters Thesis Defense "A Workload Balanced MapReduce Framework on GPU Platforms" by Yue Zhang
Abstract
The MapReduce framework is a programming model proposed by Google to process large datasets. It is an efficient framework that can be used in many areas, such as social network, scientific research, electronic business, etc. Hence, more and more MapReduce frameworks are implemented on different platforms, including Phoenix (based on multicore CPU), MapCG (based on GPU), and StreamMR (based on GPU). However, these MapReduce frameworks have limitations, and they cannot handle the collision problem in the map phase, and the unbalanced workload problems in the reduce phase. To improve the performance of the MapReduce framework on GPGPUs, in this thesis, a workload balance MapReduce framework (B MapCG) on GPUs is proposed and developed based on the MAPCG framework, to reduce the number of collisions while inserting key-value pairs in the map phase, and to handle the unbalanced workload problems in the reduce phase. The proposed B MapCG framework is evaluated on the Tesla K40 GPU with four benchmarks and eight different datasets. The experimental results showed that the B_MapCG framework achieved significant performance improvements for all the four test benchmarks both in the map phase and the reduce phase compared with MapCG.
Committee: Drs. Meilin Liu, Advisor, Jack Jean, and Travis Doom