Comparative Study on Speculative Execution Strategy to Improve MapReduce Performance
Keywords:
MapReduce, Hadoop, Straggler, speculative executionAbstract
MapReduce is widely used and popular programming model for huge amount of data processing. Hadoop is open source implementation of MapReduce framework. Performance of Hadoop depends some of the metrics like job execution time and cluster throughput. In MapReduce, Job is divided into multiple map and reduce tasks. Some tasks can be executed slowly due to internal or external reasons. Because of this slow tasks job execution time is prolonged which leads to degradation of Hadoop performance. To overcome this, current MapReduce framework launch speculative execution in which each slow tasks is backed up other node in order to reduce the job execution time. These slow tasks can be called as straggler tasks. However, current MapReduce speculative execution does not estimate the progress of the tasks properly which leads to identifying incorrect slow tasks. Also, they do not consider data skew among the tasks. This paper studies various speculative execution strategy like HAT (History based auto-tuning), Longest Approximate Time to End (LATE) and Maximum Cost Performance (MCP). These strategies overcome the drawbacks of default speculative execution to improve MapReduce performance.
References
Qi Chen, Cheng Liu and Zhen Xiao, “Improving MapReduce performance using smart speculative execution strategy”, IEEE Transaction on Computers VOL 63, NO. 4, APRIL 2014.
Apache hadoop, http://hadoop.apache.org/, Accessed on 26 December 2015
J. Dean and S. Ghemawat, “Mapreduce: Simplified Data Processing on Large Clusters,” Comm. ACM, vol. 51, pp. 107-113, Jan. 2008.
Exponential Weighted Moving Average, http://en.wikipedia.org/wiki1, Accessed on 7 January 2015
MapReduce. [Online] Available: http://www.ibm.com, Accessed on 15January 2015
G. Ananthanarayana, S. Kandula, A. Greenberg, I. Stocia, Y. Lu, B.Saha, and E. Harris, “Reining in the Outliers in Mapreduce Clusters Using Mantri” Proc. Inth USENIX Conf. Operating System Design and implementation, (OSDI ‘10), 2010.
M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica, “Improving MapReduce Performance in Heterogeneous Environments,” in Proc. of the 8th USENIX conference on Operating systems design and implementation , ser. OSDI, 2008.
Zhe Wang, Zhengdong Zhu, Pengfei Zheng, Qiang Liu, Xiaoshe Dong, “New Scheduler Strategy for Heterogeneous Workload-aware in Hadoop,” 8th Annual ChinaGrid Conference, 2013.
Huanle Xu, Wing Cheong Lau, “Optimization for Speculative Execution of Multiple Jobs in a MapReduce-like Cluster,” 8th Annual ChinaGrid Conference, 2013.
Xuelian Lin, Chunming Hu, Richong Zhang, Chengzhang Wang, “Modeling the Performance of MapReduce under Resource Contentions and Task Failures,” Cloud Computing Technology and Science (CloudCom), IEEE 5th International Conference on (Vol 1 ), December 2013.
Tao Gu, Chuang Zuo, Qun Liao, Yulu Yangand Tao Li, “Improving MapReduce Performance by Data Prefetching in Heterogeneous or Shared Environments”, International Journal of Grid and Distributed ComputingVol.6, No.5, 2013.
G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris, “Reining in the Outliers in Map-Reduce Clusters Using Mantri,” Proc. Ninth USENIX Conf. Operating Systems Design and Implementation (OSDI), 2010.
Y. Kwon, M. Balazinska, and B. Howe, “A Study of Skew in Mapreduce Applications,” Proc. Fifth Open Cirrus Summit, 2011.
Open Stack Cloud Operating System, http://www.openstack.org/, Accessed on 13 February 2015.
Amazon Elastic Compute Cloud (EC2), http://aws.amazon.com/ec2/,Accessed on 28 January 2015
Quan Chen, MinyiGuo, Qianni Deng, Long Zheng, Song Guo, Yao Shen, “HAT: History-based auto-tunningMapReduce in heterogenous environments” Springer Science+Business media, LLC, 2011
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
