Supervised Machine Learning approach for Extracting Named Entities from Hindi-English Mixed Social Media Text
Keywords:
Named Entity, Machine Learning, Support Vector Machine, Decision Tree, K-Nearest NeighbourAbstract
Named Entity Recognition (NER) is a task of identifying named entities from text written in Natural Language. In this task, a string of text in the form of sentence or paragraph is accepted as input and relevant nouns like names of people, places, organizations etc. that are mentioned in that string are identified. This task belongs Information Extraction of the field of Natural Language Processing (NLP). Significant amount of work has been carried out on named entities recognition, but most of the researches have been done for resource-rich languages and domains. It is a challenging task for an informal text and code-mixed text which complicates the process with its unstructured and incomplete information. In this paper, we propose a method of extracting named entities from code-mixed data with different machine learning based algorithms using content and contextual features extracted from code-mixed data.
References
[1] Kalika Bali, Jatin Sharma, Monojit Choudhury, and Yogarshi Vyas. 2014. “i am borrowing ya mix-ing?” an analysis of english-hindi code mixing in facebook. In Proceedings of the First Workshop on Computational Approaches to Code Switching, pages 116–126.
[2] Yogarshi Vyas, Spandana Gella, Jatin Sharma, Ka-lika Bali, and Monojit Choudhury. 2014. Pos tagging of english-hindi code-mixed social media con-tent. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 974–979.
[3] Arnav Sharma, Sakshi Gupta, Raveesh Motlani, Piyush Bansal, Manish Srivastava, Radhika Mamidi, and Dipti M Sharma. 2016. Shallow parsing pipeline for hindi-english code-mixed social media text. arXiv preprint arXiv:1604.03136.
[4] Sudha Morwal, Nusrat Jahan, and Deepti Chopra. 2012. Named entity recognition using hidden markov model (hmm). International Journal on Natural Language Computing (IJNLC), 1(4):15–23.
[5] Rupal Bhargava, Yashvardhan Sharma, and Shubham Sharma. 2016a. Sentiment analysis for mixed script indic sentences. In Advances in Computing, Com-munications and Informatics (ICACCI), 2016 Inter-national Conference on, pages 524–529. IEEE.
[6] Asif Ekbal and Sivaji Bandyopadhyay. 2008. Bengali named entity recognition using support vector machine. In Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages.
[7] Deepak Gupta, Shubham Tripathi, Asif Ekbal, and Pushpak Bhattacharyya. 2016. A hybrid approach for entity extraction in code-mixed social media data. MONEY, 25:66.
[8] Irshad Ahmad Bhat, Manish Shrivastava, and Riyaz Ahmad Bhat. 2016. Code mixed entity extraction in indian languages using neural networks. In FIRE (Working Notes), pages 296–297.
[9] Vinay Singh, Deepanshu Vijay, Syed S. Akhtar, Manish Shrivastava. Named Entity Recognition for Hindi-English Code-Mixed Social Media Text. In Proceedings of the Seventh Named Entities Workshop, pages 27–35, Melbourne, Australia, July 20, 2018, Association for Computational Linguistics
[10] Alan Ritter, Sam Clark, Mausam, Oren Etzioni; Named Entity Recognition in Tweets: An Experimental Study; in Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, July, Year:2011, Address:,Edinburgh, Scotland, UK.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
