Survey on Feature Selection Techniques towards Text Mining in Cloud
Keywords:
Data mining, Feature selection, Text mining, Filtering, factorizationAbstract
Cloud computing is a technology that provides efficient services to the users over internet. Users stores volumes of data in cloud which is rendered as data as a service (DaaS) on demand and charged as per usage. Text mining is a technology that is used to retrieve data from a massive set of database. Cloud uses Text mining to retrieve data efficiently from various cloud data centres. Text classification is a technique used for discovering classes of indefinite data. Prior to applying any mining technique, trivial features should be filtered. Feature selection is capable of improving learning process, lesser computational complexity, organizes better general models, and decreasing required storage. We analyses towards effectiveness of the clustering based feature selection method. This paper is to analysis on different techniques used for feature selection. Further survey on Feature selection and Feature extraction technique has been extract the features from the documents, which results in single and multi-label document classification. Based on the extracted features the survey is done on multiple-feature based projective nonnegative matrix factorization technique to cluster the documents.
References
[1] Lin Yue, Wanli Zuo , TaoPeng , YingWang, Xuming Han A fuzzy document clustering approach based on domain-specified ontology, “ Data & Knowledge Engineering”, 100 (2015) 148-166.
[2] Malik Tahir Hassana, Asim Karim, Jeong-Bae Kim, Moongu Jeon CDIM: Document Clustering by Discrimination Information Maximization, ”Information Sciences”, 316 (2015) 87–106.
[3] Charlotte Laclau, Mohamed Nadif "Hard and fuzzy diagonal co-clustering for document-term partitioning "Neurocomputing”, 193 (2016) 133–147
[4] Vıctor Mijangos, Gerardo Sierra, Azucena Montes Sentence level matrix representation for document spectral clustering "Pattern Recognition Letters, Elsevier”, 20 November 2016.
[5] Mei Lua, Xiang-Jun Zhao, Li Zhang, Fan-Zhang Li, Semi-supervised concept factorization for document clustering, “Information Sciences”, 331 (2016) 86–98.
[6] Tingting Wei, Yonghe Lu, Huiyou Chang, Qiang Zhou, Xianyu Bao, A semantic approach for text clustering using WordNet and lexical chains, “Expert Systems with Applications”, 42 (2015) 2264–2275.
[7] Sourav Dutta, Gerhard Weikum, Cross-Document Co-Reference Resolution using Sample-Based Clustering with Knowledge Enrichment, “Transactions of the Association for Computational Linguistics”, 3(2015)15–28.
[8] Yong-Il Kim, Yoo-Kang Jiand Sun Park, Big Text Data Clustering using Class Labels and Semantic Feature Based on Hadoop of Cloud Computing, “International Journal of Software Engineering and Its Applications”, 8(2014),1-10.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
