Balanced Data Clustering Algorithm for Both Hard and Soft Clustering

Authors

  • Das P Dept. of Computer Science, Assam University, Silchar, India
  • Ranjan Roy B Dept. of Computer Science, Assam University, Silchar, India
  • Paul S Dept. of Computer Science, Assam University, Silchar, India

DOI:

https://doi.org/10.26438/ijcse/v6i2.176183

Keywords:

k-Means, Global k-Means, Fast Global k-Means, Data Streaming

Abstract

Clustering is a widely studied problem in a variety of application domains such as neural network and statistics. It is the process of partitioning or grouping a set of patterns into disjoint clusters which show that patterns belonging to the same cluster are same or alike and patterns in different cluster are different. There are many ways to deal with the above problem of clustering. K-means is the simple and effective algorithm in producing good clustering results for many practical applications. However, they are sensitive to the choice of starting points and are inefficient for solving clustering problems in large datasets. Recently, incremental approaches have been developed to resolve difficulties with the choice of starting points. The global k-means and the fast global k-means algorithms are based on such an approach. They iteratively add one cluster center at a time. Fuzzy C- means is also very popular for fuzzy based data clustering. But all such clustering algorithms are hugely effected by the imbalanced nature of data values. Each data in the dataset has multiple attributes and the value of some attributes may be so large that the importance of other attributes values may be completely ignored during the clustering process. In this paper we proposed a data balancing technique for both fast global k-means and fuzzy c-means algorithm. We balanced the attributes values of each data in such a way that all the attributes get importance during the clustering process.

References

L. Bai, J. Liang, C. Sui, and C. Dang, “Fast global k-means clustering based on local geometrical information,” Informa- tion Sciences, vol. 245, no. 0, pp. 168 – 180, 2013.

A. Jain and R. Dubes, Eds., Algorithms for Clustering Data. Prentice Hall, 1988.

R. Wan, X. Yan, and X. Su, “A weighted fuzzy clustering algo rithm for data stream,” in Proceedings of the 2008 ISECS Inter- national Colloquium on Computing, Communication, Control, and Management - Volume 01, ser. CCCM ’08, 2008, pp. 360– 364.

B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, “Models and issues in data stream systems,” in Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, ser. PODS ’02, 2002, pp. 1–16.

A. Likas, M. Vlassis, and J. Verbeek, “The global k-means clustering algorithm,” Pattern Recognition, vol. 35, no. 2, pp. 451–461, 2003.

A. Bagirov, “Modified global k-means algorithm for sum-of- squares clustering problem,” Pattern Recognition, vol. 41, pp. 3192–3199, 2008.

H. Wang, J. Qi, W. Zheng, and M. Wang, “Balance k-means algorithm,” in Computational Intelligence and Software Engi- neering, 2009. CiSE 2009. International Conference on, Dec 2009, pp. 1–3.

R. He, W. Xu, J. Sun, and B. Zu, “Balanced k-means algorithm for partitioning areas in large-scale vehicle routing problem,” in Proceedings of the 2009 Third International Symposium on Intelligent Information Technology Application - Volume 03, ser. IITA ’09. IEEE Computer Society, 2009, pp. 87–90. [Online]. Available: http://dx.doi.org/10.1109/IITA.2009.307

Downloads

Published

2025-11-12
CITATION
DOI: 10.26438/ijcse/v6i2.176183
Published: 2025-11-12

How to Cite

[1]
P. Das, B. Ranjan Roy, and S. Paul, “Balanced Data Clustering Algorithm for Both Hard and Soft Clustering”, Int. J. Comp. Sci. Eng., vol. 6, no. 2, pp. 176–183, Nov. 2025.

Issue

Section

Research Article