Fast and Effective System for Name Entity Recognition on Big Data
Keywords:
Distributed computing, Big textual data, Named Entity Recognition (NER), Natural Language Processing (NLP), MapReduce, Hadoop and Maxent TaggerAbstract
In today scenario all data store in digital form and data size is too large. So problem is that how to manage this big data or extract information with speed and efficiency. Information extraction is a technique which using in text mining. Information extraction extract required information whose user demand from unstructured text. Information extraction use NLP (Natural Language Processing) and NER (Name entity recognition). NER systems help to machine recognize proper noun (entity), events, relationships and so on. There are several NER systems in the world. Such as GATE, CRFClassifier, OpenNLP and Stanford NLP (Natural Language Processing ). The NER system works fast for limited amount of documents but drawback of this system is that it works slows for huge/large amount of data. To overcome the drawback of NER system, this paper, report the implement of a NER which is based on Map Reduce, a distributed programming model. This improvement helps to achieve the fast extraction and reduce storage cost with better performance.
References
. Nigam, Jigyasa, and Sandeep Sahu. "An Effective Text Processing Approach With MapReduce."
. James J. (Jong Hyuk) Park et al. (eds.), Mobile, Ubiquitous, and Intelligent Computing,Lecture Notes in Electrical Engineering 274,DOI: 10.1007/978-3-642-40675-1_41, © Springer-Verlag Berlin Heidelberg 2014
. Kim, J., Lee, S., Jeong, D.-H., Jung, H.: Semantic Data Model and Service for Supporting Intelligent Legislation Establishment. In: The 2nd Joint International Semantic Technology Conference (2012)
. Klein, D., Manning, C.D.: Accurate Unlexicalized Parsing. In: Proceedings of the 41st Meeting of the Association for Computational Linguistics, pp. 423–430 (2003)
. Dean, J., Ghemawat, S.: MapReduce: simplified
data processing on large clusters. In: OSDI, pp. 137–150 (2004)
. HDFS (hadoop distributed file system) architecture(2009),http://hadoop.apache.org/common/docs/current/hdfs-design.html
. Seo, D., Hwang, M.-N., Shin, S., Choi, S.: Development of Crawler System Gathering Web Document on Science and Technology. In: The 2nd Joint International SemanticTechnology Conference (2012) Morphological features help POS tagging of unknown words across language varieties
. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pages 425–432,Sydney, July 2006. c2006 Association for Computational Linguistics
. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Databases (VLDB-94), pages 487–499, Santiago, Chile, Sept. 1994.
. en.wikipedia.org/wiki/Information_extraction
. Shvachko,K. Yahoo!,Sunnyvale,CA,USA Hairong Kuang ; Radia, S. ; Chansler, R.Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on E-ISBN :978-1-4244-7153-9
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
