Optimization of Map Reduce Using Maximum Cost Performance Strategy

Authors

  • A Saran Kumar Dept. of CSE, Kumaraguru College of Technology, Coimbatore, India
  • V Vanitha Devi Dept. of CSE, Kumaraguru College of Technology, Coimbatore, India

Keywords:

Map reduce, Cost Performance strategy, Big Data, Stragglers, Speculation

Abstract

Big data is a buzzword, used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques. In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity. Big data has the potential to help companies improve operations and make faster, more intelligent decisions.Parallel computing is a frequently used method for large scale data processing. Many computing tasks involve heavy mathematical calculations, or analysing large amounts of data. These operations can take a long time to complete using only one computer. Map Reduce is one of the most commonly used parallel computing frameworks. The execution time of the tasks and the throughput are the two important parameters of Map Reduce. Speculative execution is a method of backing up of slowly running tasks on alternate machines. Multiple speculative execution strategies have been proposed, but they have some pitfalls: (i) Use average progress rate to identify slow tasks while in reality the progress rate can be unstable and misleading, (ii) Do not consider whether backup tasks can finish earlier when choosing backup worker nodes. This project aims to improve the effectiveness of speculation execution significantly. To accurately and promptly identify the appropriate tasks, the following methods are employed: (i) Use both the progress rate and the process bandwidth within a phase to select slow tasks, (ii) Use exponentially weighted moving average (EWMA) to predict process speed and calculate a task’s remaining time, (iii) Determine which task to backup based on the load of a cluster using a cost-benefit model.

References

J. Dean and S. Ghemawat, “Map reduce: simplified data processing on large clusters,” Commun. ACM, vol. 51, pp. 107–113, January 2008.

“Apache hadoop, http://hadoop.apache.org/.”

M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: distributed data-parallel programs from sequential building blocks,” in Proc. of the 2nd ACM SIGOPS/Euro Sys European Conference on Computer Systems 2007, ser. Euro Sys ’07, 2007.

M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica, “Improving map reduce performance in heterogeneous environments,” in Proc. of the 8th USENIX conference on Operating systems design and implementation, ser. OSDI’08, 2008.

G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris, “Reining in the outliers in map-reduce clusters using mantri,” in Proc. of the 9th USENIX conference on Operating systems design and implementation, ser. OSDI’10, 2010.

Y. Kwon, M. Balazinska, and B. Howe, “A study of skew in map reduce applications,” in The 5th Open Cirrus Summit, 2011.

P.H and Ellaway, “Cumulative sum technique and its application to the analysis of peri stimulus time histograms,” Electroencephalography and Clinical Neurophysiology, vol. 45, no. 2, pp. 302–304, 1978.

K. Avi, K. Yaniv, L. Dor, L. Uri, and L. Anthony, “Kvm: The linux virtual machine monitor,” Proc. of the Linux Symposium, Ottawa, Ontario, 2007, 2007.

Quiane-Ruiz,Pinkel, C.,Schad, J. ,Dittrich, J.“RAFTing Map Reduce: Fast recovery on the RAFT” Data Engineering (ICDE), 2011 IEEE 27th International Conference in Hannover, Publication Year: 2011.

G. Ananthanarayanan, S. Agarwal, S. Kandula, A. Greenberg, I.Stoica, D. Harlan, and E. Harris, “Scarlett: Coping with Skewed Content Popularity in Map reduce Clusters,” Proc. Sixth Conf. Computer Systems (EuroSys ’11), 2011.

B. Nicolae, D. Moise, G. Antoniu, L. Bouge, and M. Dorier,“Blobseer: Bringing High Throughput under Heavy Concurrency to Hadoop Map-Reduce Applications,” Proc. IEEE Int’l Symp. Parallel Distributed Processing (IPDPS), Apr. 2010.

J. Leverich and C. Kozyrakis, “On the Energy (In)Efficiency of Hadoop Clusters,” ACM SIGOPS Operating Systems Rev., vol. 44,pp. 61-65, Mar. 2010.

T. Sandholm and K. Lai, “Mapreduce Optimization Using Regulated Dynamic Prioritization,” Proc. 11th Int’l Joint Conf. Measurement and Modeling of Computer Systems, (SIGMETRICS ’09),2009.

M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A.Goldberg, “Quincy: Fair Scheduling for Distributed Computing Clusters,” Proc. ACM SIGOPS 22nd Symp. Operating Systems Principles(SOSP ’09), 2009.

M. Zaharia, D. Borthakur, J. SenSarma, K. Elmeleegy, S. Shenker,and I. Stoica, “Delay Scheduling: A Simple Technique for AchievingLocality and Fairness in Cluster Scheduling,” Proc. Fifth European Conference Computer Systems (EuroSys ’10), 2010.

Kala Karun, A ; Chitharanjan, K ; "A review on hadoop — HDFS infrastructure extensions ", IEEE Conference on Information & Communication Technologies (ICT), JeJu Island, April 2013. Page(s): 132 - 137.

D.Deepika1, K.Pugazhmathi, “Efficient Indexing and Searching of Big Data in HDFs”, International Journal of Computer Sciences and Engineering (IJCSE) Vol.-4(4), Apr 2016, E-ISSN: 2347-2693.

Tanuja A, Swetha Ramana D, “Processing and Analyzing Big data using Hadoop”, International Journal of Computer Sciences and Engineering (IJCSE) Vol.-4(4), PP(91-94) April 2016, E-ISSN: 2347-2693.

Downloads

Published

2025-11-11

How to Cite

[1]
A. Saran Kumar and V. Vanitha Devi, “Optimization of Map Reduce Using Maximum Cost Performance Strategy”, Int. J. Comp. Sci. Eng., vol. 4, no. 6, pp. 78–87, Nov. 2025.

Issue

Section

Research Article