A Mapreduce Approach To Deal with Big Data Pre Processing And Classification Problems Based On Evolutionary Algorithms

Authors

  • Saranya MS Computer Science, Khadir Mohideen College, Bharathidasan University, Thiruchirapalli, India
  • Jayaveeran N Computer Science, Khadir Mohideen College, Bharathidasan University, Thiruchirapalli, India

DOI:

https://doi.org/10.26438/ijcse/v6i8.725730

Keywords:

Big Data, Map Reduce, Neural Network, Ant Colony, Pre process, Classification, execution time

Abstract

The big data is a term which is used to describe the exponential growth in data that has occurred recently and it also represents an immense challenge for traditional learning techniques. In order to deal with big data pre processing and classification problems, a novel MapReduce-Neuro Ant Colony (MR-NAC) algorithm was proposed. The proposed algorithm used MapReduce framework to pre process and classify the large dataset which is found to difficult without using the MapReduce framework. The experimentation for the proposed work is carried on two different datasets and results obtained are discussed. The obtained results are much satisfactory which supports the proposed novel algorithm for big data pre processing and classification. AUC and execution time are the two metrics which were used to measure the performance of the proposed MR-NAC Algorithm

References

[1] E. Alpaydin, “Introduction to Machine Learning”, MIT Press, Cambridge Mass, USA, 2ND Edition, 2010.

[2] E. Merelli, M. Pettini and M. Rasetti, “Topology driven modelling: the IS metaphor”, Natural Computing, Vol. 14, Issue 3, pp 421-430, 2015.

[3] Prakash Singh , "Efficient Deep Learning for Big Data: A Review", International Journal of Scientific Research in Computer Science and Engineering, Vol.4, Issue.6, pp.36-41, 2016.

[4] A. Fern´andez, S. del R´ıo,V.L´opez, “Big data with cloud computing: an insight on the computing environment, MapReduce, and programming frameworks,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 4, Issue 5, pp.380–409,2014.

[5] S. Sakr, A. Liu, D. M. Batista, and M. Alomari, “A survey of large scale data management approaches in cloud environments,” IEEE Communications Surveys and Tutorials, Vol.13,Issue.3, pp.311–336, 2011.

[6] Bacardit and X. Llor`a, “Large-scale data mining using genetics-based machine learning,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 3, Issue.1, pp.37–61,2013.

[7] J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters,” Communications of the ACM,Vol.51, Issue.1, pp. 107–113, 2008.

[8] J. Dean and S. Ghemawat, “Map reduce: a flexible data processing tool,” Communications of the ACM,Vol.53, Issue.1,pp.72–77, 2010.

[9] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The google file system,” In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP ’03), pp. 29–43, October 2003.

[10] M. Snir and S. Otto, “MPI—The Complete Reference: The MPI Core”, MIT Press, Boston, Mass, USA, 1998.

[11] W. Zhao, H. Ma, and Q. He, “Parallel k-means clustering based on MapReduce, In Cloud Computing, M. Jaatun, G. Zhao, and C. Rong, Eds., Vol. 5931 of Lecture Notes in Computer Science, pp. 674–679, Springer, Berlin, Germany, 2009.

[12] A. Srinivasan, T. A. Faruquie, and S. Joshi, “Data and task parallelism in ILP using MapReduce,” Machine Learning, Vol.86, Issue.1, pp.141–168, 2012.

[13] H. He, E.A. Garcia, “Learning from imbalanced data”, IEEE Transaction of Knowledge Enginnering, Vol. 21, Issue. 9, pp 1263-1284, 2009.

[14] Y. Sun, A.K.C. Wong, M.S. Kamel, “Classification of imbalanced data: a review”, International Journal of Pattern Recognition and Artificial Intelligence, Vol 23, Issue 4, pp 687-719, 2009.

[15] J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters,” Communications of the ACM, Vol.51, Issue.1, pp. 107–113, 2008.

[16] J. Dean and S. Ghemawat, “Map reduce: a flexible data processing tool,” Communications of the ACM, Vol.53, Issue.1, pp.72–77, 2010.

[17] Daniel Peralta, Sara del Río,Sergio Ramírez-Gallego, Isaac Triguero, Jose M. Benitez, and Francisco Herrera, “Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach”, Hindawi Publishing Corporation, Mathematical Problems in Engineering, Vol 2015, pp,. 1-11, 2015.

[18] Sara del Río , Victoria López, José Manuel Benítez, Francisco Herrera, “On the use of MapReduce for imbalanced big data using Random Forest”, Information Sciences, Vol 285, pp 112–137, 2014.

[19] A. Yadav, V.K. Harit, "Fault Identification in Sub-Station by Using Neuro-Fuzzy Technique", International Journal of Scientific Research in Computer Science and Engineering, Vol.4, Issue.6, pp.1-7, 2016

Downloads

Published

2025-11-15
CITATION
DOI: 10.26438/ijcse/v6i8.725730
Published: 2025-11-15

How to Cite

[1]
M. Saranya and N. Jayaveeran, “A Mapreduce Approach To Deal with Big Data Pre Processing And Classification Problems Based On Evolutionary Algorithms”, Int. J. Comp. Sci. Eng., vol. 6, no. 8, pp. 725–730, Nov. 2025.

Issue

Section

Research Article