Text Mining Using Frequent Pattern Analysis and Message Passing
DOI:
https://doi.org/10.26438/ijcse/v7i2.658667Keywords:
Parallel FP-Growth, Frequent Keywords Mining, Multi core SystemsAbstract
Text mining is a Computer Science technique to analyze text data. Text mining is text analysis, is the process of deriving high quality information from text. Text mining is to convert text into data for suitable analysis. It allows us to investigate relationship among patterns which would otherwise be extremely difficult. Various techniques are used to mining the frequent patterns in the given text which are applicable to analyze the information in huge documents. The parallel construction of FP-Trees and parallel mining on multi cores is a popular tree projection based mining algorithm. Once each processor counts the frequency of each item using its local data partition, all worker processors send the local count to the master processor which combines them and generate global count. The parallel implementation of FP-tree may show good speedups but sending the local results to master on distributed environment and merging the patterns count on master core are overhead which consumes a considerable time. This study aims at to analyze various frequent pattern mining techniques used to extract information from texts especially on multi cores and going to adopt a new technique for finding frequent patterns, which used the Dictionary based compression algorithm(LZW). The new technique is implemented with single processor as so as with multi processor using message passing technique. The main objective of this research is enhancing the speed and reduce the memory consumption required to extract the frequent patterns form the given textual data. The parallel implementation of our proposed LZW based algorithm with three datasets Webdoc, Kosarak and Trump is compared with parallel implementation of FP-Growth on single and multi core. The results shows good performance in speedup, Latency and Efficiency in proposed LZW based algorithm.
References
[1] Krishna Gadia & Kiran Bhowmick, ‘Parallel text mining in multi core systems using FP-Tree algorithm’, ScienceDirect Procedia Computer Science 45(2015)111-117, 2015
[2] S.K. Tanbeer, C.F. Ahmed, B.S. Jeong, ’Parallel and distributed frequent pattern mining in large databases’, in: Proceeding of the 11th IEEE International Conference on High Performance Computing and Communications, pp. 407–414, 2009
[3] R. Agrawal, R. Srikant, ’ Fast algorithms for mining association rules’, in: Proceedings of the 20th International Conference on Very Large Databases, , pp. 487–499, 1994.
[4] E. H. Han, G. Karypis, & V. Kumar.’ Scalable parallel data mining for association rules’,IEEE Transactions on Knowledge and Data Engineering, Vol. 12, No. 3,2000
[5] J. Han, J. Pei, and Y.Yin. Mining Frequent Patterns without Candidate Generation. In ACM SIGMOD, 2000.
[6] R. Rabenseifner, G. Hager & G. Jost,2009,’ Hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes’, in: Proceeding of the 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing (Feb. 2009), pp. 427–436.
[7] R. Garg & P.K. Mishra,2009,’Some observations of sequential, parallel and distributed association rule mining algorithms’, In: IEEE Proceeding of the International Conference on Computer and Automation Engineering (March 2009), pp. 336–342.
[8] D. Chen, C. Lai, W. Hu, W. Chen, Y. Zhang & W. Zheng, 2006,’ Tree partition based parallel frequent pattern mining on shared memory systems’, in: Proceeding of the 20th International Conference on Parallel and Distributed Processing, pp. 313–320.27.
[9] Lan Vu & Gita Alaghband, 2014, ‘Novel parallel method for association rule mining on multi-core shared memory systems’, ELSEVIER, Parallel computing 40(2014)768-785.
[10] Vu, G. Alaghband, 2012.’ Mining frequent patterns based on data characteristics’, in: Proceedings of the International Conference on Information and Knowledge Engineering, pp. 369–375.20.
[11] CC Aggarwal, 2007, ‘Data streams, models and algorithms’, Springer Science + Business media, books.google.com
[12] Krishna Gadia & Kiran Bhowmick, 2015, ‘Parallel text mining in multi core systems using FP-Tree algorithm’, ScienceDirect Procedia Computer Science 45(2015)111-117.
[13] J.S.Park, M.S.Chen & P.Yu,1995,’ An effective Hash based algorithm for mining association rules’, in Proc: ACM SIGMOD international conference on management of Data, Vol24, pp. 175-186.
[14] H.Mannila, H.Tojvonen & A.I. Verkamo, 1997,’Discovery of frequent episodes in event sequences’, Data Min. Knowl. Discovery 1(3)259-289.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
