Balanced Data Clustering Algorithm for Both Hard and Soft Clustering
DOI:
https://doi.org/10.26438/ijcse/v6i2.176183Keywords:
k-Means, Global k-Means, Fast Global k-Means, Data StreamingAbstract
Clustering is a widely studied problem in a variety of application domains such as neural network and statistics. It is the process of partitioning or grouping a set of patterns into disjoint clusters which show that patterns belonging to the same cluster are same or alike and patterns in different cluster are different. There are many ways to deal with the above problem of clustering. K-means is the simple and effective algorithm in producing good clustering results for many practical applications. However, they are sensitive to the choice of starting points and are inefficient for solving clustering problems in large datasets. Recently, incremental approaches have been developed to resolve difficulties with the choice of starting points. The global k-means and the fast global k-means algorithms are based on such an approach. They iteratively add one cluster center at a time. Fuzzy C- means is also very popular for fuzzy based data clustering. But all such clustering algorithms are hugely effected by the imbalanced nature of data values. Each data in the dataset has multiple attributes and the value of some attributes may be so large that the importance of other attributes values may be completely ignored during the clustering process. In this paper we proposed a data balancing technique for both fast global k-means and fuzzy c-means algorithm. We balanced the attributes values of each data in such a way that all the attributes get importance during the clustering process.
References
L. Bai, J. Liang, C. Sui, and C. Dang, “Fast global k-means clustering based on local geometrical information,” Informa- tion Sciences, vol. 245, no. 0, pp. 168 – 180, 2013.
A. Jain and R. Dubes, Eds., Algorithms for Clustering Data. Prentice Hall, 1988.
R. Wan, X. Yan, and X. Su, “A weighted fuzzy clustering algo rithm for data stream,” in Proceedings of the 2008 ISECS Inter- national Colloquium on Computing, Communication, Control, and Management - Volume 01, ser. CCCM ’08, 2008, pp. 360– 364.
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, “Models and issues in data stream systems,” in Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, ser. PODS ’02, 2002, pp. 1–16.
A. Likas, M. Vlassis, and J. Verbeek, “The global k-means clustering algorithm,” Pattern Recognition, vol. 35, no. 2, pp. 451–461, 2003.
A. Bagirov, “Modified global k-means algorithm for sum-of- squares clustering problem,” Pattern Recognition, vol. 41, pp. 3192–3199, 2008.
H. Wang, J. Qi, W. Zheng, and M. Wang, “Balance k-means algorithm,” in Computational Intelligence and Software Engi- neering, 2009. CiSE 2009. International Conference on, Dec 2009, pp. 1–3.
R. He, W. Xu, J. Sun, and B. Zu, “Balanced k-means algorithm for partitioning areas in large-scale vehicle routing problem,” in Proceedings of the 2009 Third International Symposium on Intelligent Information Technology Application - Volume 03, ser. IITA ’09. IEEE Computer Society, 2009, pp. 87–90. [Online]. Available: http://dx.doi.org/10.1109/IITA.2009.307
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
