Clustering approach based on Efficient Coverage with Minimum Weight for Document Data

Authors

D.S Rajput Department of Computer Application, MANIT, Bhopal (MP)
R.S Thakur Department of Computer Application, MANIT, Bhopal (MP)
G.S Thakur Department of Computer Application, MANIT, Bhopal (MP)

Keywords:

Minimum Spanning Tree, Document Clustering, World Wide Web, K-Means Algorithm

Abstract

At present time huge amount of useful data is available on web for access, and this huge amount of data is shared information which can be used by anyone intended to use. The availability of different types and nature of document data has lead to the task of clustering in large dataset. Clustering is one of the very important techniques used for classification of large dataset and widely applicable many areas. High-quality and fast document clustering algorithms play a significant role to successfully navigate, summarize and organize the information. Recent studies have shown that partitional clustering algorithms are suit- able for large datasets. The k-means algorithm [9, 10] is generally used as partitional clustering algorithm because it can be easily implemented and is most efficient in terms of execution time. The major problem with this algorithm is its sensitivity in selection of the initial partition and its convergence to local optima. In this research study we have refined the useful information from document data set using minimum spanning tree for document clustering and good quality of clusters have been generated on several document datasets, and the output show obtained indicates effective improvement in performance.

References

A. Vathy-Fogarassy, A. Kiss, and J. Abonyi , “Hybrid Minimal Spanning Tree and Mixture of Gaussians Based Clustering Algorithms”, Proceeding. IEEE InternationalConferance Tools with Artificial Intelligence, pp 73-81, 2006.

Andreas C. Muller, S. Nowozin, christoph H. Lampert, “Information theoretic clustering using minimum spanning tree”Pattern Recognition, pp. 205-215, 2012.

BhaskarAdepu, K.K. bejjanki, “A Novel Approach for Minimum Spanning Tree based Clustering Algorithm”

B. Eswara Reddy, K. Rajendra Prasad, “reducing runtime values in minimum spanning tree based clustering by visual access tendency” International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.2, No.3, pp 11-22, May 2012.

C. Zahn. “Graph-theoretical methods for detecting and describing gestalt clusters”. IEEE Transactions on Computers, C-20:pp. 68-86, 1971.

Chang, J., Luo, J., Huang, J.Z., Feng, S., Fan, J.: Minimum spanning tree based classification model for massive data with mapreduce implementation. In: Fan, W., Hsu, W., Webb, G.I., Liu, B., Zhang, C., Gunopulos, D., Wu, X. (eds.) ICDM Workshops,. IEEE Computer Society pp. 129–137, 2010.

CongnanLuoa, Yanjun Lib, Soon M. Chungc, “Text document clustering based on neighbours” Data & Knowledge EngineeringVolume 68, Issue 11, Pages 1271–1288, November 2009.

D.S Rajput, R.S. Thakur, G.S. Thakur “Rule Generation from Textual Data by using Graph Based Approach”, International Journal of Computer Application (IJCA) 0975 – 8887, New york USA, ISBN: 978-93-80865-11-8, Vol. 31– No.9,pp. 36-43 , October 2011.

D. S. Rajput, R. S. Thakur, G. S. Thakur ,NeerajSahu, “ Analysis of Social Networking Sites Using K- Mean Clustering Algorithm”, International Journal of Computer & Communication Technology (IJCCT) ISSN (ONLINE): 2231 - 0371 ISSN (PRINT): 0975 –7449 Vol-3, Iss-3, pp. 88-92, 2012.

Han I and Kamber M, “Data Mining concepts and Techniques,” M. K. Publishers, pp.335–389, 2000.

Jiaxiang Lin, Dongyi Ye, Chongcheng Chen, MiaoxianGao, “Minimum Spanning Tree Based Spatial Outlier Mining and Its Applications”, Third International Conference, RSKT 2008, Chengdu, China, May 17-19,. pp 508-515, 2008.

J. Zhang and N. Wang, “Detecting outlying subspaces for high-dimensional data: the new task, Algorithms and Performance”, Knowledge and Information Systems, 10(3):pp. 333-555, 2006.

Lijuan Zhou , Linshuang Wang ; XuebinGe ; Qian Shi , “A clustering-Based KNN improved algorithm CLKNN for text classification”, Informatics in Control, Automation and Robotics (CAR), 2nd International Asia Conference onVol.- 3 pp: 212 – 215, 2010.

M. Laszlo and S. Mukherjee, “Minimum Spanning Tree Partitioning Algorithm for Micro aggregation”, IEEE Transaction, Knowledge and Data Engineering, Vol. 17, no 7, pp 902-911, July 2005.

O. Grygorash, Y. Zhou, Z. Jorgensen, “Minimum spanning tree based clustering algorithm”, in Proceeding of the 18th International Conference on Tools with Artificial Intelligence, pp. 73–81, 2006.

PiotrJuszczak, David M.J. Taxa, ElżbietaPe¸kalskab, Robert P.W. Duina, “Minimum spanning tree based one-class classifier “Advances in Machine Learning and Computational Intelligence, Volume 72, Issues 7–9, , pp. 1859–1869, March 2009.

P.Sampurnima, J Srinivas&Harikrishna, “Performance of Improved Minimum Spanning Tree Based on Clustering Technique” Global Journal of Computer Science and Technology Software & Data Engineering, ISSN: 0975-4172 Volume 12 Issue 13 pp 16-22, 2012.

Vathy-Fogarassy ,A.Kiss, J.Abnoyi,”Hybrid Minimal Spanning tree based clustering and mixture of Gaussians based clustering algorithm”, Foundations of Information and Knowledge systems, Springer, pp 313-330, 2006.

William B. March, Parikshit Ram, Alexander G. Gray “Fast Euclidean minimum spanning tree: algorithm, analysis, and applications” In proceeding of: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, July 25-28, 2010.

Y.Xu, V.Olman and D.Xu. “Minimum spanning trees for gene expression data clustering”. Genome Informatics, 12:pp24-33, 2001.

Downloads

PDF ²

Published

2013-09-30

How to Cite

[1]

D. Rajput, R. Thakur, and G. Thakur, “Clustering approach based on Efficient Coverage with Minimum Weight for Document Data”, Int. J. Comp. Sci. Eng., vol. 1, no. 1, pp. 6–13, Sep. 2013.

Download Citation

Issue

Vol. 1 No. 1 (2013): IJCSE September Edition

Section

Research Article

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.

Clustering approach based on Efficient Coverage with Minimum Weight for Document Data

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Journal Information

UGC Gazette Regulation

Join Editorial Board

Information

Current Issue

Keywords