Evaluation of Clustering Algorithm in Data Mining
Keywords:
Cluster Algorithm, Data Mining, bisecting k-means, FIHC, CFWS find CFWMSAbstract
Text mining is the use of data mining techniques to unstructured text in order to extract important and nontrivial knowledge. One of the key methods of text mining, or the unsupervised classification of related content into various categories, is text clustering. The performance of text clustering is being improved in this study. We looked on four areas of the text clustering algorithms: document representation, document similarity analysis, high dimension reduction, and parallelization. We suggest a collection of very effective text clustering techniques that focus on the special features of unstructured text databases. All of the suggested algorithms have undergone thorough performance studies. We contrasted these techniques with current text clustering algorithms in order to assess their performance.
References
Mohammed J. Zaki. Scalable algorithms for association mining. IEEE Trans. on Knowl. and Data Eng., 12(3):pp 372–390, 2000.
P. Agrawal, O. Benjelloun, A. Das Sarma, C. Hayworth, S. Nabar, T. Sugihara, and J. Widom. "Trio: A system for data, uncertainty, and lineage". In Proc. Int. Conf. on Very Large Databases, 2006.
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. ACM SIGMOD Int. Conf. on Management of Data, Minneapolis, MN, 1994.
T. Bernecker, H.-P. Kriegel, M. Renz, F. Verhein, and A. Züfle. Probabilistic frequent itemset mining in uncertain databases. In In Proc. 15th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining, Paris, France, 2009.
C. C. Aggrawal and P. S. Yu, Finding Generalized Projected Clusters in High Dimensional Spaces," Proc. of ACM SIGMOD Int'l Conf. on Management of Data, 2000, pp. 70{81.
J. Allan, HARD Track Overview in TREC 2003 High Accuracy Retrieval from Documents,"Proc. of the 12th Text Retrieval Conference, 2003, pp. 24{37.
M. R. Anderberg, Cluster Analysis for Applications, Academic Press, 1973.
DPVG06] Nele Dexters, Paul W. Purdom, and Dirk Van Gucht. A probability analysis for candidate-based frequent itemset algorithms. In SAC ’06: Proceedings of the 2006 ACM symposium on Applied computing, New York, NY, USA, 2006. ACM, pp541–545.
Gregory Buehrer, Srinivasan Parthasarathy, and Amol Ghoting. Out-ofcore frequent pattern mining on a commodity pc. In KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, New York, NY, USA, 2006, pp 86–95.
Toon Calders. Deducing bounds on the frequency of itemsets. In EDBT Workshop DTDM Database Techniques in Data Mining, 2002.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
