A Mapreduce Approach To Deal with Big Data Pre Processing And Classification Problems Based On Evolutionary Algorithms
DOI:
https://doi.org/10.26438/ijcse/v6i8.725730Keywords:
Big Data, Map Reduce, Neural Network, Ant Colony, Pre process, Classification, execution timeAbstract
The big data is a term which is used to describe the exponential growth in data that has occurred recently and it also represents an immense challenge for traditional learning techniques. In order to deal with big data pre processing and classification problems, a novel MapReduce-Neuro Ant Colony (MR-NAC) algorithm was proposed. The proposed algorithm used MapReduce framework to pre process and classify the large dataset which is found to difficult without using the MapReduce framework. The experimentation for the proposed work is carried on two different datasets and results obtained are discussed. The obtained results are much satisfactory which supports the proposed novel algorithm for big data pre processing and classification. AUC and execution time are the two metrics which were used to measure the performance of the proposed MR-NAC Algorithm
References
[1] E. Alpaydin, “Introduction to Machine Learning”, MIT Press, Cambridge Mass, USA, 2ND Edition, 2010.
[2] E. Merelli, M. Pettini and M. Rasetti, “Topology driven modelling: the IS metaphor”, Natural Computing, Vol. 14, Issue 3, pp 421-430, 2015.
[3] Prakash Singh , "Efficient Deep Learning for Big Data: A Review", International Journal of Scientific Research in Computer Science and Engineering, Vol.4, Issue.6, pp.36-41, 2016.
[4] A. Fern´andez, S. del R´ıo,V.L´opez, “Big data with cloud computing: an insight on the computing environment, MapReduce, and programming frameworks,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 4, Issue 5, pp.380–409,2014.
[5] S. Sakr, A. Liu, D. M. Batista, and M. Alomari, “A survey of large scale data management approaches in cloud environments,” IEEE Communications Surveys and Tutorials, Vol.13,Issue.3, pp.311–336, 2011.
[6] Bacardit and X. Llor`a, “Large-scale data mining using genetics-based machine learning,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 3, Issue.1, pp.37–61,2013.
[7] J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters,” Communications of the ACM,Vol.51, Issue.1, pp. 107–113, 2008.
[8] J. Dean and S. Ghemawat, “Map reduce: a flexible data processing tool,” Communications of the ACM,Vol.53, Issue.1,pp.72–77, 2010.
[9] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The google file system,” In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP ’03), pp. 29–43, October 2003.
[10] M. Snir and S. Otto, “MPI—The Complete Reference: The MPI Core”, MIT Press, Boston, Mass, USA, 1998.
[11] W. Zhao, H. Ma, and Q. He, “Parallel k-means clustering based on MapReduce, In Cloud Computing, M. Jaatun, G. Zhao, and C. Rong, Eds., Vol. 5931 of Lecture Notes in Computer Science, pp. 674–679, Springer, Berlin, Germany, 2009.
[12] A. Srinivasan, T. A. Faruquie, and S. Joshi, “Data and task parallelism in ILP using MapReduce,” Machine Learning, Vol.86, Issue.1, pp.141–168, 2012.
[13] H. He, E.A. Garcia, “Learning from imbalanced data”, IEEE Transaction of Knowledge Enginnering, Vol. 21, Issue. 9, pp 1263-1284, 2009.
[14] Y. Sun, A.K.C. Wong, M.S. Kamel, “Classification of imbalanced data: a review”, International Journal of Pattern Recognition and Artificial Intelligence, Vol 23, Issue 4, pp 687-719, 2009.
[15] J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters,” Communications of the ACM, Vol.51, Issue.1, pp. 107–113, 2008.
[16] J. Dean and S. Ghemawat, “Map reduce: a flexible data processing tool,” Communications of the ACM, Vol.53, Issue.1, pp.72–77, 2010.
[17] Daniel Peralta, Sara del Río,Sergio Ramírez-Gallego, Isaac Triguero, Jose M. Benitez, and Francisco Herrera, “Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach”, Hindawi Publishing Corporation, Mathematical Problems in Engineering, Vol 2015, pp,. 1-11, 2015.
[18] Sara del Río , Victoria López, José Manuel Benítez, Francisco Herrera, “On the use of MapReduce for imbalanced big data using Random Forest”, Information Sciences, Vol 285, pp 112–137, 2014.
[19] A. Yadav, V.K. Harit, "Fault Identification in Sub-Station by Using Neuro-Fuzzy Technique", International Journal of Scientific Research in Computer Science and Engineering, Vol.4, Issue.6, pp.1-7, 2016
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
