POS Tagging for Marathi Language using Hidden Markov Model
DOI:
https://doi.org/10.26438/ijcse/v6i1.409412Keywords:
Marathi, HMM, POS, Part of Speech, Tagset, Supervised learningAbstract
Part-of-speech (POS) tagging plays significant role in almost every natural language processing task. This paper addresses a problem of POS tagging for Marathi language. Marathi is free word order, morphologically rich and highly inflectional Indian language. Supervised learning method that uses Hidden Markov Model is implemented to mark Marathi text using POS tags. The dataset required for training the algorithm consists of 12,000 Marathi sentences comprising news from popular Marathi newspaper. The algorithm for POS tagging predicts the tag for current word using the previous word tag pair. The POS tagging system has reported 86.61% accuracy in predicting correct POS to the words.
References
Nita Patil, Ajay S. Patil and B. V. Pawar,"Issues and Challenges in Marathi Named Entity Recognition " International Journal on Natural Language Computing (IJNLC) Vol. 5, No.1, pp:15-31(2016) .
Bharati, A., Sharma, D.M., Bai, L., Sangal, R., “AnnCorra: Annotating Corpora Guidelines for POS and Chunk Annotation for Indian Languages” (2006).
http://ltrc.iiit.ac.in/tr031/posguidelines.pdf
Singh Thoudam Doren and Bandyopadhyay Sivaji, “Morphology Driven Manipuri POS Tagger”, Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages, pages 91–98, Hyderabad, India (2008)
Shrivastava, M., Bhattacharyya, P., (2008) “Hindi POS Tagger Using Naive Stemming: Harnessing Morphological Information Without Extensive Linguistic Knowledge”. In: International Conference on NLP (ICON08), Macmillan Press, New Delhi.
Manju K., Soumya S., Sumam, M. I., (2009) “Development of a POS Tagger for Malayalam - An Experience”. In International Conference on Advances in Recent Technologies in Communication and Computing, pp.709-713.
H B Patil, A S Patil and B V Pawar. “Part-of-Speech Tagger for Marathi Language using Limited Training Corpora”. IJCA Proceedings on National Conference on Recent Advances in Information Technology NCRAIT(4), 2014, pages 33-37.
Pallavi Bagul, Archana Mishra, Prachi Mahajan, Medinee Kulkarni, Gauri Dhopavkar, "Rule Based POS Tagger for Marathi Text". In proceeding of: International Journal of Computer Science and Information Technologies, Vol. 5 (2) , 2014, 1322-1326.
Jyoti Singh, Nisheeth Joshi, Iti Mathur “Part Of Speech Tagging Of Marathi Text Using Trigram Method”. International Journal of Advanced Information Technology (IJAIT) Vol. 3, No.2, DOI: 10.5121/ijait2013.3203.
Nidhi Mishra, Amit Mishra, “Part of Speech Tagging for Hindi Corpus”. In proceeding of International Conference on Communication Systems and Network Technologies, 978-0-7695-44373/11, 2011 IEEE DOI 10.1109/CSNT.2011.118.
Javed Ahmed Mahar, Ghulam Qadir Memon, “Rule Based Part of Speech Tagging of Sindhi Language”. In proceeding of International Conference on Signal Acquisition and Processing 978-0-7695-3960-7/10,2010 IEEE DOI 10.1109/ICSAP.2010.27.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
