Rule based Stemmer for Marathi Language
DOI:
https://doi.org/10.26438/ijcse/v6i5.500505Keywords:
natural language processing, stemming, corpus, marathi, suffix stripping and stopwordsAbstract
Natural Language Processing (NLP) is a branch of artificial intelligence which deals with the analysis and synthesis of natural languages in the form of text and speech. NLP requires stemming algorithms to remove derivational and inflectional affixes without performing morphological analysis of the inputs. These algorithms are essential to extract root or stem words. The goal of stemming is to reduce word forms/grammatical forms to their root forms. To accomplish, specific knowledge of language is required. In NLP, the stemmer can be used to improve the efficiency of text summarization, text mining, information retrieval and sentiment analysis. In this paper, we proposed a rule based stemming approach for Marathi language using Marathi corpus, stopword list and suffix stripping rules.
References
Ciravegna F, Harabagiu S, “Recent Advances in Natural Language Processing”.IEEE,2013.
Garje, G. V., & Kharate, G. K. “Survey of machine translation systems in India.” International Journal on Natural Language Computing (IJNLC) Vol, 2, 47-67, 2013.
Hovy, E., & Lin,C.Y., “Automated text summarization and the SUMMARIST system”. In Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998. Association for Computational Linguistics, (1998, October).
Lin, C. Y. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out: Proceedings of the ACL-04 workshop (Vol. 8), (2004, July).
M. Kasthuri and S. B. R. Kumar “A Comprehensive Analyze Of Stemming Algorithms For Indian And Non-Indian Languages” International Journal of Computer Engineering and Applications, Volume VII, Issue III, September 14.
M.Thangarasu., R.Manavalan, “A Literature Review: Stemming Algorithms for Indian Languages”, International Journal of Computer Trends and Technology (IJCTT), volume 4 Issue 8, August 2013.
Mihalcea, R., & Tarau, P.,“TextRank: Bringing order into texts. Association for Computational Linguistics”, (2004, July).
Ms. Anjali Ganesh Jivani, “A Comparative Study of Stemming Algorithms”, International Journal of Computer Technology and Applications, Vol.2 (6), PP 1930-1938, NOV-DEC 2011.
Mudassar, Tanveer J Siddiqui, “Discovering suffixes: A Case Study for Marathi Language”, (IJCSE) International Journal on Computer Science and Engineering, 2010.
Rohit Kansal Vishal Goyal G. S. Lehal, “Rule Based Urdu Stemmer”. Proceedings of COLING 2012: Demonstration Papers, pages 267–276, COLING 2012, Mumbai, December 2012.
Sajjad Ahmad Khan1, Waqas Anwar1, Usama Ijaz Bajwa1, Xuan Wang2, “A Light Weight Stemmer for Urdu Language: A Scarce Resourced Language”, Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing (SANLP),, COLING 2012, Mumbai, December 2012.
Snigdha Paul, Mini Tandon, Nisheeth Joshi and Iti Mathur, “Design of a rule based Hindi Lemmatizer”, pp. 67–74, 2013.
Upendra Mishra, Chandra Prakash, “MAULIK: An Effective Stemmer for Hindi Language”, International Journal on Computer Science and Engineering (IJCSE) Vol. 4 No. 5, PP.711-717, May 2012.
V.Gupta,N.Joshi,I.Mathur,”Design & Development of Rule Based Infectional and Derivational Urdu Stemmer ‘Úsal’” ,INBUSH-ERA-2015,7-12,2015.
V.Gupta,N.Joshi,I.Mathur, “Design & Development of Rule Based Urdu Lemmatizer”,IEEE,2015.
V.Gupta,N.Joshi,I.Mathur, “Rule based stemmer in Urdu”,Computer and Communication Technology(ICCCT) 2013 4th International,2013.
Virat V. Giri, Dr.M.M. Math & Dr.U.P. Kulkarni, “A Survey of Automatic Text Summarization System for Different Regional Language in India”, In Bonfring International Journal of Software Engineering and Soft Computing, Vol. 6, Special Issue, October 2016
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
