Efficient Indexing and Searching of Big Data in HDFs
Keywords:
Hadoop, Enormous Data, Efficient Indexing, Data StructureAbstract
Efficient indexing is an efficient, standard data structure, most suited for look operation over an exhaustive set of data. The enormous set of data is mostly unstructured furthermore, does not fit into traditional database categories. Extensive scale preparing of such data needs a dispersed structure such as Hadoop where computational assets could easily be shared furthermore, accessed. An execution of a look motor in Hadoop over millions of Wikipedia reports utilizing an transformed record data structure would be conveyed out for making look operation more accomplished. Transformed record data structure is utilized for mapping a word in a record or set of records to their relating locations. A hash table is utilized in this data structure which stores each word as record furthermore, their relating areas as its values thereby providing simple lookup furthermore, extremely of data making it suitable for look operations.
References
Raj, A. Kaur, K. ; Dutta, U. ; Sandeep, V.V. ; Rao, S. "Enhancement of Hadoop Clusters with Virtualization Using the Capacity Scheduler", Third International Conference on Services in Emerging Markets (ICSEM),Mysore, India, Dec 2012. Page(s): 50 - 57.
Jiong Xie; Shu Yin ; Xiaojun Ruan ; Zhiyang Ding ; Yun Tian ; Majors, J. ; Manzanares, A. ; Xiao Qin. "Improving MapReduce performance through data placement in heterogeneous Hadoop clusters". IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), Atlanta, GA, April, 2010. Page(s): 1 - 9.
Kala Karun, A ; Chitharanjan, K ; "A review on hadoop — HDFS infrastructure extensions ", IEEE Conference on Information & Communication Technologies (ICT), JeJu Island, April 2013. Page(s): 132 - 137.
Richard Mccreadie ; Craig Macdonald ; Iadh Ounis; "MapReduce indexing strategies: Studying scalability and efficiency". International Journal of Information Processing and Management. Volume 48 Issue 5, September, 2012. Pages: 873-888.
Apache Hadoop, Hadoop, HDFS, Avro, Cassandra, Chukwa, HBase, Hive, Mahout, Pig, Zookeeper are trademarks of the Apache Software Foundation. http://www.hadoop.apache.org/ Last Published: 10/16/2013
Barry Wilkinson; Michael Allen; “Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers” (2nd Edition). Publication Date: March 14, 2004,
Gal Lavee ; Ronny Lempel ; Edo Liberty ; Oren Somekh ; " Inverted index compression via online document routing" Published in: WWW '11 Proceedings of the 20th international conference on World Wide Web. Pages 487-496.
Guanghui Xu; Feng Xu; Hongxu Ma; "Deploying and researching Hadoop in virtual machines". Published in: IEEE International Conference on Automation and Logistics (ICAL), Zhengzhou, Aug. 2012. Page(s): 395 - 399.
Shvachko, K.; Hairong Kuang ; Radia, S. ; Chansler, R. " The Hadoop Distributed File System". Published in: IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), Incline Village, NV, May 2010. Page(s): 1 - 10.
Ishii, M.; Jungkyu Han; Makino, H; "Design and performance evaluation for Hadoop clusters on virtualized environment" Published in: International Conference on Information Networking (ICOIN), Bangkok, Jan. 2013. Page(s): 244 - 249.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
