Extracting Tasks of Text Files using Dictionary Based Approach for Classification and Indexing
Keywords:
Natural language processing, text mining, part-of-speech tagging, text files, machine learning techniques, WordNet library etcAbstract
In software documentation, product knowledge and software requirement are very important to improve product quality. Reading of whole documentation of large corpus cannot be possible by developers in maintenance stage. They need to receive software documentation entities i.e. (development, designing and testing etc.) in a short period of time. In software documentation an important documents are able to record. There exists a space between information which developer wants and software documentation. This difference can be experimental whenever developers effort to discover the accurate information in the correct form at the exact time. To solve this problem, an approach for extracting relevant task of the documentation under four phases of software entities (i.e. documentation, development, testing and other etc.) is described. The main idea is task extracted from the software documentation, freeing the developer easily get the required data from software documentation with customize portal using Natural Language Processing (NLP) and then the category of task can be generated easily from existing applications. The machine learning approach that is based on supervised learning technique for training dataset in the form of text files based on text mining. Our approach use WordNet library to identify relevant tasks for calculating frequency of each word which allows developers in a piece of software to discover the word usage and also assigning Part-of Speech (POS) to each word. The result shows that task is extracted by calculating how many sentences, tokens and tasks appearing in a document and also shows task is relevant or not. It also reduced a live space between information which developers want and software documentation. This is used to improve the performance of system by taking feedback of developers. The result is identified through customize portal which helps to developers easily get information in a short period of time. The system is 80% precise to extract task by taking feedback of developers in the form of comment.
References
Christoph Treude, Martin P. Robillard, and Barth_el_emy Dagenais ,”Extracting Development Task To Navigate Software Documentation” in Proc, IEEE Soft,Vol.41 No.6,2015,pp,565-581, June 2015.
S. Gupta, S. Malik, L. Pollock, and K. Vijay-Shanker, “Part-of speech tagging of program identifiers for improved text-based software engineering tools,” in Proc. 21st IEEE Int. Conf. Program Comprehension, pp. 3–12,2013 .
M. Barouni-Ebrahimi and A. A. Ghorbani, “On query completion in web search engines based on query stream mining,” in Proc. IEEE/WIC/ACM Int. Conf. Web Intell., pp. 317–320,2007.
P. Mika, E. Meij, and H. Zaragoza, ”Investigating the semantic gap through query log analysis,” in Proc. 8th Int. Semantic Web Conf., pp. 441–455,2009.
S.L.Abebe and P.Tonella,“Natural language parsing of program element names for concept extraction,” in Proc. 18 th IEEE Int. Conf. Program Comprehension, pp. 156–159,2010.
C. Treude and M.-A. Storey, “Effective communication of software development knowledge through community portals,” in Proc. 8th Joint Meet. Eur. Soft. Eng. Conf. ACM SIGSOFT Symp. Found. Soft. Eng., pp. 91–101,2011.
T. C. Lethbridge, J. Singer, and A. Forward, “How software engineers use documentation: The state of the practice,” IEEE Soft., vol. 20, no. 6, pp. 35–39, Nov./Dec. 2003.
C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky, “The Stanford Core NLP natural language processing toolkit,” in Proc. 52 nd Annu. Meet. Assoc. Computat. Linguistics: Syst. Demonstrations, pp. 55–60,2014.
G. Sridhara, E. Hill, L. Pollock, and K. Vijay-Shanker, “Identifying word relations in software: A comparative study of semantic similarity tools,” in Proc. 16th IEEE Int. Conf. Program Comprehension, pp. 123–132, 2008.
H. Zhong, L. Zhang, T. Xie, and H. Mei, “Inferring resource specifications From natural language API documentation,” in Proc. 24th IEEE/ACM Int. Conf. Automated Soft. Eng., pp. 307–318,2011.
S. Haiduc, G. Bavota, A. Marcus, R. Oliveto, A. De Lucia, and T. Menzies, “Automatic query reformulations for text retrieval in software engineering,” in Proc. 35th Int. Conf. Soft. Eng., pp. 842–851,2013.
J. Yang and L. Tan, “Inferring semantically related words from software context,” in Proc. 9th Working Conf. Min. Softw. Repositories, pp. 161–170,2012.
E. Hill, L. Pollock, and K. Vijay-Shanker, “Automatically capturing source code context of NL-queries for software maintenance and reuse,” in Proc. 31st Int. Conf. Soft. Eng., pp. 232–242,2009.
M. J. Howard, S. Gupta, L. Pollock, and K. Vijay-Shanker, “Automatically mining software-based, semantically-similar words from comment-code mappings,” in Proc. 10th Working Conf. Min. Softw. Repositories, pp. 377–386, 2013.
James H. Martin, “Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition”, by Prentice Hall ,January 2000.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
