POS Tagging for Marathi Language using Hidden Markov Model

Authors

  • Patil NV School of Computer Sciences, North Maharashtra University, Jalgaon, India

DOI:

https://doi.org/10.26438/ijcse/v6i1.409412

Keywords:

Marathi, HMM, POS, Part of Speech, Tagset, Supervised learning

Abstract

Part-of-speech (POS) tagging plays significant role in almost every natural language processing task. This paper addresses a problem of POS tagging for Marathi language. Marathi is free word order, morphologically rich and highly inflectional Indian language. Supervised learning method that uses Hidden Markov Model is implemented to mark Marathi text using POS tags. The dataset required for training the algorithm consists of 12,000 Marathi sentences comprising news from popular Marathi newspaper. The algorithm for POS tagging predicts the tag for current word using the previous word tag pair. The POS tagging system has reported 86.61% accuracy in predicting correct POS to the words.

References

Nita Patil, Ajay S. Patil and B. V. Pawar,"Issues and Challenges in Marathi Named Entity Recognition " International Journal on Natural Language Computing (IJNLC) Vol. 5, No.1, pp:15-31(2016) .

Bharati, A., Sharma, D.M., Bai, L., Sangal, R., “AnnCorra: Annotating Corpora Guidelines for POS and Chunk Annotation for Indian Languages” (2006).

http://ltrc.iiit.ac.in/tr031/posguidelines.pdf

Singh Thoudam Doren and Bandyopadhyay Sivaji, “Morphology Driven Manipuri POS Tagger”, Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages, pages 91–98, Hyderabad, India (2008)

Shrivastava, M., Bhattacharyya, P., (2008) “Hindi POS Tagger Using Naive Stemming: Harnessing Morphological Information Without Extensive Linguistic Knowledge”. In: International Conference on NLP (ICON08), Macmillan Press, New Delhi.

Manju K., Soumya S., Sumam, M. I., (2009) “Development of a POS Tagger for Malayalam - An Experience”. In International Conference on Advances in Recent Technologies in Communication and Computing, pp.709-713.

H B Patil, A S Patil and B V Pawar. “Part-of-Speech Tagger for Marathi Language using Limited Training Corpora”. IJCA Proceedings on National Conference on Recent Advances in Information Technology NCRAIT(4), 2014, pages 33-37.

Pallavi Bagul, Archana Mishra, Prachi Mahajan, Medinee Kulkarni, Gauri Dhopavkar, "Rule Based POS Tagger for Marathi Text". In proceeding of: International Journal of Computer Science and Information Technologies, Vol. 5 (2) , 2014, 1322-1326.

Jyoti Singh, Nisheeth Joshi, Iti Mathur “Part Of Speech Tagging Of Marathi Text Using Trigram Method”. International Journal of Advanced Information Technology (IJAIT) Vol. 3, No.2, DOI: 10.5121/ijait2013.3203.

Nidhi Mishra, Amit Mishra, “Part of Speech Tagging for Hindi Corpus”. In proceeding of International Conference on Communication Systems and Network Technologies, 978-0-7695-44373/11, 2011 IEEE DOI 10.1109/CSNT.2011.118.

Javed Ahmed Mahar, Ghulam Qadir Memon, “Rule Based Part of Speech Tagging of Sindhi Language”. In proceeding of International Conference on Signal Acquisition and Processing 978-0-7695-3960-7/10,2010 IEEE DOI 10.1109/ICSAP.2010.27.

Downloads

Published

2025-11-12
CITATION
DOI: 10.26438/ijcse/v6i1.409412
Published: 2025-11-12

How to Cite

[1]
N. V. Patil, “POS Tagging for Marathi Language using Hidden Markov Model”, Int. J. Comp. Sci. Eng., vol. 6, no. 1, pp. 409–412, Nov. 2025.

Issue

Section

Research Article