Automatic Clustering Based On Outward Statistical Testing Using Advanced Density Metrics

Authors

  • Jadhav AA Department of Computer Engineering, Rajashree Shahu School of Engineering and Research, JSPM NTC, Pune, INDIA
  • VS Gaikwad Department of Computer Engineering, Rajashree Shahu School of Engineering and Research, JSPM NTC, Pune, INDIA

Keywords:

Clustering, Clustering Center Identification, Long-tailed Distribution, Outward Statistical Testing

Abstract

Clustering is the process of organizing objects into several groups whose members are similar in some way and is very important technique in data mining as it has applications spread extensively example marketing, biology, pattern recognition etc. Various algorithms have been proposed, published, implemented for clustering like the one published by Rodriguez and Liao but this algorithm is dependent and sensitive to specified parameters and also faces difficulties in identification of ideal problems. Another one published by G. Wang and Q. Song this algorithm do not list all possible numbers of the nearest neighbors and the accuracy is not better in terms of Olivetti face data set then impact of this on performance. This paper overcome the problem faces by above algorithm so, here proposes a new clustering method that will identify cluster centers automatically via statistical testing. Here first define a new metric to evaluate the local density of an object which is named K-density and second metric is define to evaluate the distance of an object to its neighbors with higher density. Then, product of these two metrics is used to evaluate the centrality of each object. After analyzing the distribution of these metrics further transformed the clustering center identification into a problem of extreme-value detection from a long-tailed distribution Finally, apply outward statistical testing method to detect the clustering centers automatically and then completed the clustering process by assigning each of the rest objects to the cluster that contains its nearest neighbor with higher K-density.

References

G. Wang and Q. Song, "Automatic Clustering via Outward Statistical Testing on Density Metrics," in IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 8, pp. 1971-1985, Aug. 1 2016.

A. Rodriguez and A. Laio, “Clustering by fast search and find of density peaks,” Science, vol. 344, no. 6191, pp. 1492–1496, 2014.

C.-P. Lai, P.-C. Chung, and V. S. Tseng, “A novel two-level clustering method for time Series data analysis,” Expert Systems with Applications, vol. 37, no. 9, pp. 6319–6326, 2010.

I. Rui Xu, Donald C. Wunsch, “Clustering algorithms in biomedical research: a review,” IEEE Reviews in Biomedical Engineering, vol. 3, pp. 120–154, 2010.

W. C. Xiankun Yang, “A novel spatial clustering algorithm based on delaunay triangulation,” J. Software Engineering & Applications, vol. 3, pp. 141–149, 2010.

B. Nadler and M. Galun, “Fundamental limitations of spectral clustering,” in Advances in Neural Information Processing Systems, 2006, pp. 1017–1024.

T. Warren Liao, “Clustering of time series data-a survey,” Pattern Recognition, vol. 38, no. 11, pp. 1857–1874, Nov. 2005.

o. Dongquan Liu, Sourina, “Free-parameters clustering of spatial data with non-uniform density,” in IEEE conference on cybernetics and intelligent systems, 2004, pp. 387 – 392.

T. Kanungo, D. M. Mount, N. S. Netanyahu, C.D. Piatko, R. Silverman, and A. Y.Wu, “An efficient k-means clustering algorithm: analysis and implementation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 881 – 892, 2002.

V. Estivill-Castro and I. Lee, “Argument free clustering for large spatial point-data sets via boundary extraction from Delaunay diagram,” Computers, Environment and Urban Systems, vol. 26, no. 4, pp. 315–334, 2002.

P. S. Bradley, O. L. Mangasarian, and W. N. Street, “Clustering via concave minimization,” Advances in neural information processing systems, pp. 368–374, 1997.

M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise.” in Proceedings of International Conference on Knowledge Discovery and Data Mining, vol. 96, no. 34, 1996, pp. 226–231.

L. Hagen and A. B. Kahng, “New spectral methods for ratio cut partitioning and clustering,” IEEE Transactions on Computer-aided design of integrated circuits and systems, vol. 11, no. 9, pp. 1074–1085,1992.

W. E. Donath and A. J. Hoffman, “Lower bounds for the partitioning of graphs,” IBM Journal of Research and Development, vol. 17, no. 5, pp. 420–425, 1973.

Downloads

Published

2025-11-11

How to Cite

[1]
A. A. Jadhav and V. Gaikwad, “Automatic Clustering Based On Outward Statistical Testing Using Advanced Density Metrics”, Int. J. Comp. Sci. Eng., vol. 5, no. 1, pp. 32–35, Nov. 2025.