Implementation of K-Means Clustering in Big Data Environment
DOI:
https://doi.org/10.26438/ijcse/v7i11.3844Keywords:
Big Data, Big Data Analytics, Unsupervised learning, Clustering Algorithm, improvementsAbstract
In recent years the digital data is grown much frequently. Handling and processing of such bulky data are much complex and need the attention of a human. Moreover, the existing techniques and methods are not much suitable to deal with this complex nature of computation. To deal with such a complex nature of computation, the big data analytics played an essential role. In this presented work the unsupervised learning technique namely k-means clustering is implemented initially and their performance is measured. During this to enhance the performance of the system a new modified k-means clustering algorithm is proposed by improving the centroid selection technique and using the RBF kernel. The comparative performance analysis of both the versions of k-means clustering demonstrate the modified k-means clustering is efficient and has the low algorithm run time. Therefore it is a promising approach for analytics, thus it’s a future extension that is also presented in this work.
References
[1] R. H. Hariri, E. M. Fredericks, K. M. Bowers, “Uncertainty in big data analytics: survey, opportunities, and challenges”, J Big Data (2019) 6:44, https://doi.org/10.1186/s40537-019-0206-3
[2] A. Patel, M. Jaiswal, R. K. Chawda, “An Approach to Predict Train Delay Using Big Data Analytic Approaches”, International Journal of Advanced Research in Computer and Communication Engineering, ISO 3297:2007 Certified, Vol. 7, Issue 3, March 2018
[3] Z. P. Reddy, P.N.V.S. P. Kumar, “Comparing the Word count Execution Time in Hadoop & Spark”, IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 3 Issue 10, October 2016, ISSN (Online) 2348 – 7968
[4] F. C. Yayah, K. I. Ghauth, C. Y. Ting, “Adopting Big Data Analytics Strategy in Telecommunication Industry”, Journal of Computer Science & Computational Mathematics, Volume 7, Issue 3, September 2017, DOI: 10.20967/jcscm.2017.03.002
[5] C. L. P. Chen, C. Y. Zhang, “Data-intensive applications, challenges, techniques and technologies: A survey on Big Data”, Information Sciences 275 (2014) 314–347
[6] L. Xiangi, G. Zhao, Q. Li, W. Hao, F. Li, “TUMK-ELM: A Fast Unsupervised Heterogeneous Data Learning Approach”, VOLUME 6, 2018, 2169-3536, 2018 IEEE
[7] N. Hajj, Y. Rizk, M. Awad, “A MapReduce Cortical Algorithms Implementation for Unsupervised Learning of Big Data”, Procedia Computer Science, Volume 53, 2015, Pages 327–334, 2015 INNS Conference on Big Data
[8] L. Zhou, S. Pan, J. Wang, A. V. Vasilakos, “Machine learning on big data: Opportunities and challenges”, Neurocomputing 237 (2017) 350–361
[9] X. W. Chen, XIAOTONG LIN2, “Big Data Deep Learning: Challenges and Perspectives”, Vol. 2, 2014, 2169-3536, 2014 IEEE
[10] Y. Lei, F. Jia, J. Lin, S. Xing, S. X. Ding, “An Intelligent Fault Diagnosis Method Using Unsupervised Feature Learning Towards Mechanical Big Data”, 0278-0046 (c) 2015 IEEE.
[11] A. B. Ayed, M. B. Halima, A. M. Alimi, “Survey on clustering methods: Towards fuzzy clustering for big data”, 978-1-4799-5934-1/14/$31.00 ©2014 IEEE
[12] X. Cai, F. Nie, H. Huang, “Multi-View K-Means Clustering on Big Data”, Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence,
[13] A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y. Zomaya, S. Foufou, A. Bouras, “A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis”, Vol. 2, No. 3, Sep. 2014, 2168-6750 2014 IEEE
[14] S. S. Chouhan, R. Khatri, “Data Mining based Technique for Natural Event Prediction and Disaster Management”, International Journal of Computer Applications (0975 – 8887) Volume 139 – No.14, April 2016
[15] B. Feizizadeh, M. S. Roodposhti, T. Blaschke, J. Aryal, “Comparing GIS-based support vector machine kernel functions for landslide susceptibility mapping”, Arab J Geosci (2017) 10:122, DOI 10.1007/s12517-017-2918-z.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
