Clustering Algorithms Validated Using Relative Index Validation

Authors

  • Selvi TS Dept. of Computer Science, Periyar E.V.R. College, Tiruchirappalli, Tamilnadu, India
  • Parimala R Dept. of Computer Science, Periyar E.V.R. College, Tiruchirappalli, Tamilnadu, India

DOI:

https://doi.org/10.26438/ijcse/v6i10.8595

Keywords:

Clustering, RelativeValidityMeasures, PCA, KPCA

Abstract

Clustering pertains to the task of finding out groups of objects such that the objects of one group are dissimilar from other groups and is similar within the same group. This work uses feature selection technique like the Document frequency Feature selection (DFFS) and feature extraction techniques like Principal Component Analysis (PCA) and Kernel Principal Component Analysis (KPCA) were it constructs a small set of features from the original features. The newly constructed features run the K-Means algorithm without any loss of information. On several runs evaluate the accuracy for the clustering algorithms and record the results. For the obtained results, determine the cluster validation. Internal validation measures are employed to evaluate for cluster validation, based on these measures the relative validation measure is employed to determine the best clustering algorithm. Experiments are conducted for various benchmark datasets comprising of unlabelled documents and the final results prove to show that DFFS, KPCA followed by K-Means algorithm gives the best clustering results of accuracy

References

[1] K.P. Agrawal, S.Garg, P. Patel, "Performance Measures for Densed and Arbitrary Shaped Cluster", International Journal of Computer Science & Communication, vol 6, no.2, pp.338-350, 2015.

[2] Y. Liu, Z. Li, H. Xiong, X. Gao, J. Wu,"Understanding of Internal Clustering Validation Measure", 2010 IEEE InternationalConference on Data Mining Australia, pp.911-916, 2010.

[3] S. Saitta, B. Raphael, I.F.C. Smith, "A Bounded Index for Cluster Validity", Machine Learning and Data Mining in Pattern Recognition, Springer, Heidelberg, LNAI.4571, no.1, pp.174-187, 2007

[4] Mustakim, "Centroid K-Means clustering Optimization using Eigen vector principal component analysis", Journal of Theoretical and Applied InformationTechnology , vol.95, no.15, pp.3534-3542, 2017

[5] C. Legany, S. Juhasz, A. Babos, "Cluster Validity Measurement Techniques", Proceedings of the 5th WSEAS Int. Conf. on Artificial Intelligence, Knowledge Engineering and Data Bases, Spain, pp.388-393, 2006.

[6] T. Karkkainen, S.Jauhiainen, "A Simple Cluster Validation Index with Maximal Coverage", ESANN 2017 proceedingsEuropean Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning , i6doc.com publ, Belgium, pp.293-298, 2017.

[7] L.J.Deborah, R.Baskaran, A.Kannan, "A Survey on Internal Validity Measure for ClusterValidation", International Journal of Computer Science & Engineering Survey (IJCSES), vol.1, no.2, pp.85-102, 2010

[8] S.Jauhiainen,J.Hamalainen, T.Karkkainen, "Comparison of Internal Clustering Validation Indices for Prototype-Based Clustering Algorithms", ArticleAlgorithms , vol.10, no.105, pp.1-14, 2017.

[9] M. Charrad,Y. Lechevallier, M.B. Ahmed, G. Saporta, ”On the Number of Clusters in Block Clustering Algorithms", Proceedings of the Twenty-Third International Florida Artificial Intelligence Research Society Conference (FLAIRS 2010), pp.392-397, Florida

[10]J.Baarsch, M. EmreCelebi, "Investigation of Internal Validity Measures for K-Means Clustering", Proceedings of the Intn. Multiconference of Engineers and computer scientist, Hongkong, vol 1, 2012.

[11]A.Thalamuthu, I.Mukhopadhyay, X. Zheng, G.C. Tseng, "Evaluation and comparison of gene clustering method in microarray analysis", Bioinformatics, vol.22, no.19, pp.2405-2412, 2006.

[12]J.Schultz, L.Hubert, "Quadratic assignment as a general data analysis strategy", British Journal of Mathematical and Statistical Psychology, vol.29, no.2, pp.190-241,1976.

[13] D.W.Bouldin, D. L. Davies, "A cluster separation measure", IEEE Transaction on Pattern Analysis and Machine Intelligence PAMI-1,vol.3, no.2, pp.224-227, 1979.

[14]M. Halkidi, Y.Batistakis, M.Vazirgiannis, “Quality Scheme Assessment in the Clustering Process”, Proc. of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pp.265-276, 2000

[15]M.Vazirgiannis, M.Halkidi, "Clustering validity assessment:Finding the optimal partitoning of a data set", Proceedings IEEE International Conference on data Mining, USA, pp.187-194, 2001.

[16] T. Harabasz, J. Calinski , "A dendrite method for cluster analysis", Communications in Statistics, vol.3, no.1, pp.1-27, 1974

[17] J.Dunn, "Well separated clusters and optimal fuzzy partitions", Journal of Cybernetics, vol.4, no.1, pp.95-104, 1974

[18] F. B Baker, L. J.Hubert, "Measuring the power of hierarchical cluster analysis", Journal of the American Statistical Association, vol.70, no.349, pp.31-38, 1975

[19] P.J.Rousseeuw, "Silhouettes: a graphical aid to the interpretation and validation of cluster analysis", Journal of Computaional and Applied mathematics, vol.20, pp.53-65, 1987

[20] T.SenthilSelvi, R.Parimala, "Improving Clustering Accuracy using Feature Extraction Method", International Journal of Scientific Research in Computer Science and Engineering (isroset) ,vol.6, no. 2, pp.15-19, 2018.

[21]R Core Team, “R: A language and environment for statistical computing”, R Foundation for Statistical Computing, Vienna, Austria, pp.1-2673, 2018, https://www.R-project.org/.

[22] B. Desgraupes, ” clusterCrit: Clustering Indices”, R package, pp.1-34, 2018, https://CRAN.Rproject.org/package=clusterCrit

Downloads

Published

2025-11-17
CITATION
DOI: 10.26438/ijcse/v6i10.8595
Published: 2025-11-17

How to Cite

[1]
T. S. Selvi and R. Parimala, “Clustering Algorithms Validated Using Relative Index Validation”, Int. J. Comp. Sci. Eng., vol. 6, no. 10, pp. 85–95, Nov. 2025.

Issue

Section

Research Article