Prevention of Empty Clusters and Incomplete Data Problems using Modified K-Means and Gaussian Mixture Model
Keywords:
Unsupervised Learning, Clustering Analysis, K-Means, Expectation MaximizationAbstract
Cluster analysis, in unsupervised learning, divides similar data into groups or clusters that are meaningful and useful. Due to good performance in clustering on massive data sets K-Means clustering is feasible in multiple areas of science and technology. The clustering algorithms may face problems of empty clusters and incomplete data. This empty cluster problem is caused by bad initialization of the center point and this may route to signifying performance degradation. In this theme, the K- Means clustering algorithm is revisited from the probabilistic viewpoint and reformed by the similarity among the K-Means and finite Gaussian Mixture Model (GMM). The initial centroids or current best estimate for the parameters are calculated from the list of all data, known and unknown. Therefore, any two or more primal centroids may not be equal or not very close to each other and data will be assigned to the appropriate clusters with closely fair centroids. The newly proposed modified K-Means using GMM of the Expectation Maximization approach efficiently eliminate the empty cluster and incomplete data problems
References
[1] MacQueen, J. "Classification and analysis of multivariate observations." 5th Berkeley Symp. Math. Statist. Probability. Los Angeles LA USA: University of California, 1967.
[2] Reynolds, Douglas A. "Gaussian mixture models." Encyclopedia of biometrics 741, pp.659-663, 2009.
[3] Dempster, Arthur P., Nan M. Laird, and Donald B. Rubin. "Maximum likelihood from incomplete data via the EM algorithm." Journal of the royal statistical society: series B (methodological) 39.1: pp.1-22, 1977.
[4] Bradley, Paul S., and Usama M. Fayyad. "Refining initial points for k-means clustering." ICML. Vol.98, 1998.
[5] Pakhira, Malay K. "A modified k-means algorithm to avoid empty clusters." International Journal of Recent Trends in Engineering 1.1: 220, 2009.
[6] Yang, Miin-Shen, Chien-Yo Lai, and Chih-Ying Lin. "A robust EM clustering algorithm for Gaussian mixture models." Pattern Recognition 45.11: pp.3950-3961, 2012.
[7] McLachlan, Geoffrey J., and Suren Rathnayake. "On the number of components in a Gaussian mixture model." Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4.5: pp.341-355, 2014.
[8] Huang, Tao, Heng Peng, and Kun Zhang. "Model selection for Gaussian mixture models." Statistica Sinica: pp.147-169, 2017.
[9] Patel, Eva, and Dharmender Singh Kushwaha. "Clustering cloud workloads: K-means vs gaussian mixture model." Procedia Computer Science 171: pp.158-167, 2020.
[10] Androniceanu, Armenia, Jani Kinnunen, and Irina Georgescu. "E-Government clusters in the EU based on the Gaussian Mixture Models." Administratie si Management Public 35: pp.6-20, 2020.
[11] Löffler, Matthias, Anderson Y. Zhang, and Harrison H. Zhou. "Optimality of spectral clustering in the Gaussian mixture model." The Annals of Statistics 49.5: pp.2506-2530, 2021.
[12] Chen, Yongxin, Tryphon T. Georgiou, and Allen Tannenbaum. "Optimal transport for Gaussian mixture models." IEEE Access 7: pp.6269-6278, 2018.
[13] Viroli, Cinzia, and Geoffrey J. McLachlan. "Deep Gaussian mixture models." Statistics and Computing 29: pp.43-51, 2019.
[14] Yuan, Wentao, et al. "Deepgmr: Learning latent gaussian mixture models for registration." Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16. Springer International Publishing, 2020.
[15] Shahin, Ismail, Ali Bou Nassif, and Shibani Hamsa. "Emotion recognition using hybrid Gaussian mixture model and deep neural network." IEEE access 7: pp.26777-26787, 2019.
[16] Zong, Bo, et al. "Deep autoencoding gaussian mixture model for unsupervised anomaly detection." International conference on learning representations. 2018.
[17] An, Peng, Zhiyuan Wang, and Chunjiong Zhang. "Ensemble unsupervised autoencoders and Gaussian mixture model for cyberattack detection." Information Processing & Management 59.2 (2022): 102844.
[18] Ding, Nan, et al. "Real-time anomaly detection based on long short-Term memory and Gaussian Mixture Model." Computers & Electrical Engineering 79 (2019): 106458.
[19] Wan, Huan, et al. "A novel Gaussian mixture model for classification." 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). IEEE, 2019.
[20] Fu, Yinlin, et al. "Gaussian mixture model with feature selection: An embedded approach." Computers & Industrial Engineering 152 (2021): 107000.
[21] Singhal, Amit, et al. "Modeling and prediction of COVID-19 pandemic using Gaussian mixture model." Chaos, Solitons & Fractals 138 (2020): 110023.
[22] Zhu, Weiqiang, et al. "Earthquake phase association using a Bayesian Gaussian mixture model." Journal of Geophysical Research: Solid Earth 127.5 (2022): e2021JB023249.
[23] Datta, R. P., and Sanjib Saha. "Applying rule-based classification techniques to medical databases: an empirical study." International Journal of Business Intelligence and Systems Engineering 1.1: pp.32-48, 2016.
[24] Das, Subhankar, and Sanjib Saha. "Data mining and soft computing using support vector machine: A survey." International Journal of Computer Applications 77.14, 2013.
[25] Saha, Sanjib, and Debashis Nandi. "Data Classification based on Decision Tree, Rule Generation, Bayes and Statistical Methods: An Empirical Comparison." Int. J. Comput. Appl 129.7: pp.36-41, 2015.
[26] Saha, Sanjib. "Non-rigid Registration of De-noised Ultrasound Breast Tumors in Image Guided Breast-Conserving Surgery." Intelligent Systems and Human Machine Collaboration. Springer, Singapore, pp.191-206, 2023.
[27] Saha, Sanjib, et al. "ADU-Net: An Attention Dense U-Net based deep supervised DNN for automated lesion segmentation of COVID-19 from chest CT images." Biomedical Signal Processing and Control 85: 104974, 2023.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
