A Hybrid Data Clustering Technique in Big Data using Machine Learning

Authors

  • Sharma K Dept. of Computer Science and Engineering, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India
  • Rehan P Dept. of Computer Science and Engineering, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India

DOI:

https://doi.org/10.26438/ijcse/v8i1.4047

Keywords:

Data mining, Big data, Clustering, Classification, Support Vector Machine

Abstract

Big Data refers to a huge collection of data like the Banking data, social media data, repository data etc. These types of fields are responsible for day to day relevant data retrieval and processing. Clustering is one of major tasks which are done for data in order to minimize the time delay and efficient information retrieval. In this work we worked on similarity index in the form of cosine and soft cosine to count the total connection with respect to documents in the form of data. Then we use Cosine and Soft Cosine measures as hybrid Similarity algorithm to intakes the threshold policy of K means and co relation linkage property of Linkage clustering and forms new clusters. The cross-validation of the proposed work model has been done using Support Vector Machine followed by K-Mediod to improve the accuracy of clustering. This research work also focuses on different techniques of Clustering as well as classification. This research work mainly focuses on optimizing the clustering performance of the Big Data so that wealthy information can be retrieved with least cost.

References

[1] Dipti Shikha Singh and Garima Singh, “Big Data: A Review”, International Research Journal of Engineering and Technology (IRJET), Vol. 04, No. 04, pp. 822-824, 2017

[2] Richa Gupta, Sunny Gupta, and Anuradha Singhal, "Big data: overview" International Journal of Computer Trends and Technology (IJCTT), Vol. 9, No. 5, pp. 266-268, 2014

[3] S. Gnanapriya, R. Suganya, G. Sumithra Devi, and M. Suresh Kumar, "Data Mining Concepts and Techniques", Data Mining and Knowledge Engineering, Vol. 2, no. 9, pp: 256-263, 2010

[4] T. Sajana, CM Sheela Rani, and K. V. Narayana, “A survey on clustering techniques for big data mining”, Indian Journal of Science and Technology, Vol. 9, no. 3, 2016.

[5] V. W. Ajin, and Lekshmy D. Kumar, "Big data and clustering algorithms", In IEEE International Conference on Research Advances in Integrated Navigation Systems (RAINS), pp. 1-5, 2016.

[6] Raj Kumar, and Rajesh Verma, "Classification algorithms for data mining - A survey", In the International Journal of the Innovations in Engineering and Technology (IJIET), vol. 1, no. 2, pp: 7-14. 2012.

[7] Ahmed Oussous, Fatima-Zahra Benjelloun, Ayoub Ait Lahcen, and Samir Belfkih, “Big Data Technologies: A Survey”, Journal of King Saud University-Computer and Information Sciences, 2017.

[8] Adil Fahad, Najlaa Alshatri, Zahir Tari, Abdullah Alamri, Ibrahim Khalil, Albert Y. Zomaya, Sebti Foufou, and Abdelaziz Bouras, “A survey of clustering algorithms for big data: Taxonomy and empirical analysis”, IEEE transactions on emerging topics in computing, Vol. 2, no. 3, pp: 267-279, 2014

[9] G. Kesavaraj, and S. Sukumaran. "A study on classification techniques in data mining." In IEEE Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), pp. 1-7. 2013.

[10] R. Tamilselvi and S. Kalaiselvi, "An Overview of Data Mining Techniques and Applications", International Journal of Science and Research (IJSR), Vol. 2, No. 2, pp. 506-509, 2013.

[11] Praful Koturwar, Sheetal Girase, and Debajyoti Mukhopadhyay, "A survey of classification techniques in the area of big data", arXiv preprint arXiv: 1503.07477, 2015.

[12] V. W. Ajin, and Lekshmy D. Kumar, "Big data and clustering algorithms", In IEEE International Conference on Research Advances in Integrated Navigation Systems (RAINS), pp. 1-5. 2016.

[13] Ahmed Oussous, Fatima-Zahra Benjelloun, Ayoub Ait Lahcen, and Samir Belfkih, “Big Data Technologies: A Survey”, Journal of King Saud University-Computer and Information Sciences, 2017.

Downloads

Published

2020-01-31
CITATION
DOI: 10.26438/ijcse/v8i1.4047
Published: 2020-01-31

How to Cite

[1]
K. Sharma and P. Rehan, “A Hybrid Data Clustering Technique in Big Data using Machine Learning”, Int. J. Comp. Sci. Eng., vol. 8, no. 1, pp. 40–47, Jan. 2020.

Issue

Section

Research Article