Improved Analysis of Unstructured Datasets using Thesaurus Model

Authors

Kurian SM Department of Computer Science& Engineering, Mangalam College of Engineering, Kerala, India
George N Department of Computer Science& Engineering, Mangalam College of Engineering, Kerala, India
Sainudeen JP Department of Computer Science& Engineering, Mangalam College of Engineering, Kerala, India
John NM Department of Computer Science& Engineering, Mangalam College of Engineering, Kerala, India

DOI:

https://doi.org/10.26438/ijcse/v7i2.10331037

Keywords:

Hadoop, MapReduce, HDFS, NoSQL

Abstract

Humankind has put away in excess of 295 billion gigabytes (or 295 Exabyte) of information beginning around 1986, according to a report by the University of Southern California. Putting away and checking this information in generally disseminated conditions for all day, every day is an enormous errand for worldwide assistance associations. These datasets require high handling power which can't be presented by conventional information bases as they are put away in an unstructured arrangement. Although one can utilize Map Reduce worldview to take care of this issue utilizing java-based Hadoop, it can't give us with most extreme usefulness. Downsides can be defeated utilizing Hadoop-streaming methods that permit clients to characterize non-java executable for handling this dataset. This paper proposes a THESAURUS model which permits a quicker and more straightforward form of business examination.

References

[1] Apache Hadoop.[Online].Available: http://hadoop.apache.org

[2] Apache Hadoop-Streaming.[Online].:http://hadoop- streaming.apache.org

[3] Cassandra wiki, operations. [Online]. Available: http://wiki.apache.org/cassandra/Operations

[4] NOSQL data storage [online]: http://nosql-database.org

[5] E. Dede, B. Sendir, P. Kuzlu, J. Weachock, M. Govindaraju, and L. Ramakrishnan, “A processing pipeline for cassandra datasets based on Hadoop streaming,” in Proc. IEEE Big Data Conf., Res. Track, Anchorage, AL, USA, pp. 168–175,2014.

[6] E. Dede, B. Sendir, P. Kuzlu, J. Weachock, M. Govindaraju, L. Ramakrishnan, "Processing Cassandra Datasets with Hadoop-Streaming Based Approaches",IEEE Transactions on Services Computing, Vol. 9,Issue 1,pp 46-58.

[7] J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae,J. Qiu, and G. Fox, “Twister: A runtime for iterative mapreduce,” in Proc. 19th ACMInt. Symp. High Perform. Distrib. Comput., pp. 810–818,2010

Downloads

PDF ⁰

Published

2019-02-28

CITATION

DOI: 10.26438/ijcse/v7i2.10331037

Published: 2019-02-28

How to Cite

[1]

S. M. Kurian, N. George, J. P. Sainudeen, and N. M. John, “Improved Analysis of Unstructured Datasets using Thesaurus Model”, Int. J. Comp. Sci. Eng., vol. 7, no. 2, pp. 1033–1037, Feb. 2019.

Download Citation

Issue

Vol. 7 No. 2 (2019): IJCSE February Edition

Section

Research Article

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.

Improved Analysis of Unstructured Datasets using Thesaurus Model

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Journal Information

UGC Gazette Regulation

Join Editorial Board

Information

Current Issue

Keywords