Text Categorization using Apriori Algorithm
DOI:
https://doi.org/10.26438/ijcse/v6i8.212217Keywords:
Itemsets, Tokenization, Stemming, Apriori algorithmAbstract
Knowledge exploration from the large set of data, generated as a result of the various data processing activities is an effective application of data mining. Text mining applications have also become important areas of application in the field of document processing. Document clustering has also become an important process for helping the information retrieval systems to organize vast amount of data. This can be tried with categorical data and for image categorization. At the same, time, frequent pattern mining has also become a very important undertaking in data mining. In the research work described in this paper, Apriori algorithm has been applied to generate frequent itemset and this method contains mainly two steps, viz. candidate generation and pruning techniques for the satisfaction of the desired objective. Aim of this paper is to focus on frequent itemset generation from dataset of variable length. Several steps have been executed to achieve the desired result. The primary goal has been to build a method which can be used to find significant items from a text database in an easy and efficient way.
References
[1] R. Agarwal, R. Srikant “Fast Algorithms for Mining Association Rules”, In Proceedings Of Int. Conf. on Very Lata Bases, pp. 487 – 499, 1994.
[2] B. Babcock, S. Babu, M. Datar,R. Motwani, J. Widom, “Models and Issues in Data Stream Systems”. In Proceedings Of ACM Symp. on Principles of Database Systems, pp. 1-16, 2002.
[3] R. Agrawal, T. Imielinski, and A. Swami,“Mining Association Rules between Sets of Items in Large Databases”, In Proceedings of ACM-SIGMOD International Conference on Management of Data, pp. 207–216, 1993.
[4] G. Manku, R. Motwani, “Approximate Frequency Counts over Data Streams”, In Proceedings of International Conference on Very Large Data Bases, pp. 346-357, 2002.
[5] S. Ozel, H. Atlay, “An Algorithm for Mining Association Rules Using Perfect Hashing and Database Pruning”, Güvenir Bilkent University, Department of Computer Engineering, Ankara, Turkey.
[6] J. Reynaldo, D.B. Tonara, “Data Mining Application using Association Rule Mining ECLAT Algorithm Based on SPMF”, 3rd International Conference on Electrical Systems, Technology and Information, 2017.
[7] S. Rewatkar, A. Pimpalkar, “Associated Sensor Patterns Mining of Data Stream from WSN Dataset”, International Journal on Computer Science and engineering, Vol 8, Issue 10, 2016.
[8] M. El-Hajj, O.R. Zaiane, “COFI Approach for Mining Frequent Itemsets Revisited”, In Proceedings of the Ninth ACM SIGMOD Workshop on Resesrach Isssues in Data Mining and Knowledge Discovery, pp. 70-75, 2004.
[9] W. Cheung, O.R. Zaïane, “Incremental Mining of Frequent Patterns Without Candidate Generation or Support Constraint”, In Proceedings of the Seventh International Database Engineering and Applications Symposium, 2003.
[10] X.Y. Wang, J. Zhang, H.B. Ma, Y.F. Hu, “A New Self-Adaptive Algorithm For Frequent Pattern Mining” , In Proceedings of the Fifth International Conference on Machine Learning and Cybernetics, pp. 13-16, 2006.
[11] S. Aggarwal, R. Kaur, "Comparative Study of Various Improved Versions of AprioriAlgorithm", International Journal of Engineering Trends and Technogy, Vol 4, Issue 4, pp 687-690, 2013.
[12] M.J. Zaki, "Parallel and Distributed Association Mining: A Survey", In Proceedings of Concurrency IEEE, Vol 7, Issue 4, pp 14-25, 1999.
[13] S. Brin, R. Motwani, J. D. Ullman S. Tsur, “Dynamic Itemset counting and Implication Rules for Market Basket Data”, ACM SIGMOD, Vol 26, Issue 2, pp. 255-264, 1997.
[14] Tsay, Y. Jiuan, T. J. Hsu, Y. J. Rung, "FIUT: A New Method for Mining Frequent Itemsets” Information Sciences, Vol 179, Issue 11, 2009.
[15] G.Pyun, U.Yun, K.H.Ryu, “Efficient Frequent Pattern Mining Based on Linear Prefix Tree”, Knowledge-Based Systems,Vo. 55, Issue 1, pp 125-139, 2014.
[16] D. Xin, J. Han, X. Yan, H. Cheng, "Mining Compressed Frequent-Pattern Sets", Proceedings of the Thirty First international Conference on Very Large Data Bases, pp709-720, 2005.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
