Improved Analysis of Unstructured Datasets using Thesaurus Model

Authors

  • Kurian SM Department of Computer Science& Engineering, Mangalam College of Engineering, Kerala, India
  • George N Department of Computer Science& Engineering, Mangalam College of Engineering, Kerala, India
  • Sainudeen JP Department of Computer Science& Engineering, Mangalam College of Engineering, Kerala, India
  • John NM Department of Computer Science& Engineering, Mangalam College of Engineering, Kerala, India

DOI:

https://doi.org/10.26438/ijcse/v7i2.10331037

Keywords:

Hadoop, MapReduce, HDFS, NoSQL

Abstract

Humankind has put away in excess of 295 billion gigabytes (or 295 Exabyte) of information beginning around 1986, according to a report by the University of Southern California. Putting away and checking this information in generally disseminated conditions for all day, every day is an enormous errand for worldwide assistance associations. These datasets require high handling power which can't be presented by conventional information bases as they are put away in an unstructured arrangement. Although one can utilize Map Reduce worldview to take care of this issue utilizing java-based Hadoop, it can't give us with most extreme usefulness. Downsides can be defeated utilizing Hadoop-streaming methods that permit clients to characterize non-java executable for handling this dataset. This paper proposes a THESAURUS model which permits a quicker and more straightforward form of business examination.

References

[1] Apache Hadoop.[Online].Available: http://hadoop.apache.org

[2] Apache Hadoop-Streaming.[Online].:http://hadoop- streaming.apache.org

[3] Cassandra wiki, operations. [Online]. Available: http://wiki.apache.org/cassandra/Operations

[4] NOSQL data storage [online]: http://nosql-database.org

[5] E. Dede, B. Sendir, P. Kuzlu, J. Weachock, M. Govindaraju, and L. Ramakrishnan, “A processing pipeline for cassandra datasets based on Hadoop streaming,” in Proc. IEEE Big Data Conf., Res. Track, Anchorage, AL, USA, pp. 168–175,2014.

[6] E. Dede, B. Sendir, P. Kuzlu, J. Weachock, M. Govindaraju, L. Ramakrishnan, "Processing Cassandra Datasets with Hadoop-Streaming Based Approaches",IEEE Transactions on Services Computing, Vol. 9,Issue 1,pp 46-58.

[7] J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae,J. Qiu, and G. Fox, “Twister: A runtime for iterative mapreduce,” in Proc. 19th ACMInt. Symp. High Perform. Distrib. Comput., pp. 810–818,2010

Downloads

Published

2019-02-28
CITATION
DOI: 10.26438/ijcse/v7i2.10331037
Published: 2019-02-28

How to Cite

[1]
S. M. Kurian, N. George, J. P. Sainudeen, and N. M. John, “Improved Analysis of Unstructured Datasets using Thesaurus Model”, Int. J. Comp. Sci. Eng., vol. 7, no. 2, pp. 1033–1037, Feb. 2019.

Issue

Section

Research Article