Fast and Effective System for Name Entity Recognition on Big Data

Authors

Nigam J Department of CS, SRIT, Jabalpur, RGPV University
Sahu S Department of CS, SRIT, Jabalpur, RGPV University

Keywords:

Distributed computing, Big textual data, Named Entity Recognition (NER), Natural Language Processing (NLP), MapReduce, Hadoop and Maxent Tagger

Abstract

In today scenario all data store in digital form and data size is too large. So problem is that how to manage this big data or extract information with speed and efficiency. Information extraction is a technique which using in text mining. Information extraction extract required information whose user demand from unstructured text. Information extraction use NLP (Natural Language Processing) and NER (Name entity recognition). NER systems help to machine recognize proper noun (entity), events, relationships and so on. There are several NER systems in the world. Such as GATE, CRFClassifier, OpenNLP and Stanford NLP (Natural Language Processing ). The NER system works fast for limited amount of documents but drawback of this system is that it works slows for huge/large amount of data. To overcome the drawback of NER system, this paper, report the implement of a NER which is based on Map Reduce, a distributed programming model. This improvement helps to achieve the fast extraction and reduce storage cost with better performance.

References

. Nigam, Jigyasa, and Sandeep Sahu. "An Effective Text Processing Approach With MapReduce."

. James J. (Jong Hyuk) Park et al. (eds.), Mobile, Ubiquitous, and Intelligent Computing,Lecture Notes in Electrical Engineering 274,DOI: 10.1007/978-3-642-40675-1_41, © Springer-Verlag Berlin Heidelberg 2014

. Kim, J., Lee, S., Jeong, D.-H., Jung, H.: Semantic Data Model and Service for Supporting Intelligent Legislation Establishment. In: The 2nd Joint International Semantic Technology Conference (2012)

. Klein, D., Manning, C.D.: Accurate Unlexicalized Parsing. In: Proceedings of the 41st Meeting of the Association for Computational Linguistics, pp. 423–430 (2003)

. Dean, J., Ghemawat, S.: MapReduce: simplified

data processing on large clusters. In: OSDI, pp. 137–150 (2004)

. HDFS (hadoop distributed file system) architecture(2009),http://hadoop.apache.org/common/docs/current/hdfs-design.html

. Seo, D., Hwang, M.-N., Shin, S., Choi, S.: Development of Crawler System Gathering Web Document on Science and Technology. In: The 2nd Joint International SemanticTechnology Conference (2012) Morphological features help POS tagging of unknown words across language varieties

. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pages 425–432,Sydney, July 2006. c2006 Association for Computational Linguistics

. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Databases (VLDB-94), pages 487–499, Santiago, Chile, Sept. 1994.

. en.wikipedia.org/wiki/Information_extraction

. Shvachko,K. Yahoo!,Sunnyvale,CA,USA Hairong Kuang ; Radia, S. ; Chansler, R.Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on E-ISBN :978-1-4244-7153-9

Downloads

PDF ⁰

Published

2015-02-28

How to Cite

[1]

J. Nigam and S. Sahu, “Fast and Effective System for Name Entity Recognition on Big Data”, Int. J. Comp. Sci. Eng., vol. 3, no. 2, pp. 31–35, Feb. 2015.

Download Citation

Issue

Vol. 3 No. 2 (2015): IJCSE February Edition

Section

Research Article

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.

Fast and Effective System for Name Entity Recognition on Big Data

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Journal Information

UGC Gazette Regulation

Join Editorial Board

Information

Current Issue

Keywords