Comparative Study of Big Data Technologies and Frameworks
DOI:
https://doi.org/10.26438/ijcse/v6i8.488495Keywords:
Big Data, Hadoop, MapReduce, HBase, Sqoop, Flume;, Apache Spark, Cloudera, HortonworksAbstract
The organization's hunger for data insights and the adaptation of the World Wide Web has increased exponentially the generation and collection speed of data. There is a challenge to capture, store and analyze this large set of unstructured data, which have taken the shape of Big Data. In this paper, the definition of Big Data is introduced from different aspects to comprehend its concept. The architecture of Big Data is analyzed to study the processing mechanism of Big Data. The various Big Data technologies like Hadoop, HBase, Map Reduce, Pig, Hive, Sqoop, and Flume are studied and compare based on features supported by them. A comprehensive study of frameworks like Apache Spark, Cloudera, and Hortonworks used for execution of Big Data technologies is done by highlighting their important features. This paper also represents how data related to fields like the Stock market, Agriculture, Medical Health Records, and Internet traffic is stored, processed and analyzed using Big Data technologies and frameworks
References
3pillarglobal.com, How to Analyze Big Data with Hadoop Technologies [Online], Available: http://www.3pillarglobal.com/ and http://www.3pillarglobal.com/insights/analyze-big-datahadoop-technologies (accessed on 11 April 2018)
Er. Rupinder Kaur, Raghu Garg, Dr Himanshu Aggarwal, Big Data Analytics Framework to Identify Crop Disease and Recommendation a Solution, IEEE, International Conference on Inventive Computation Technologies (ICICT), volume 2, 2016.
Haritha Chennamsetty, Suresh Chalasani, Derek Riley, Predictive Analytics on Electronic Health Records (EHRs) using Hadoop and Hive, IEEE, International Conference on Electrical, Computer and Communication Technologies (ICECCT), 2015.
Abdeltawab M. Hendawi, Fatemah Alali, Xiaoyu Wang, Yunfei Guan, Tianshu Zhou, Xiao Liu, Nada Basit, John A. Stankovic, Hobbits: Hadoop and Hive Based Internet Traffic Analysis, IEEE, International Conference on Big Data (Big Data), 2016.
J. Gantz and D. Reinsel, Extracting value from chaos, in Proc. IDC iView, pp. 1–12, 2011.
J. Manyika et al, Big Data: The Next Frontier for Innovation Competition, and Productivity, San Francisco, CA, USA: McKinsey Global Institute, pp. 1–37, 2011.
M. Cooper and P. Mell (2012), Tackling Big Data [Online], Available: http://csrc.nist.gov/groups/SMA/forum/documents/june2012present ations/fcsm_june2012_cooper_mell.pdf (accessed on 13 May 2018)
G. Blackett (2013), Analytics Network-O.R. Analytics [Online], Available: http://www.theorsociety.com/Pages/SpecialInterest/AnalyticsNetwo rk_analytics.aspx (accessed on 13 May 2018)
Palanisamy, B. Singh, & Liu, “cost-effective resource provisioning for MapReduce in a cloud,” IEEE Transactions on Parallel and Distributed Systems, pp: 1265-1279, 2015.
Mike Frampton, Mastering Apache Spark (ed.) 2015, Packet publication ltd., U.K.
Cloudera, Cloudera Platform 2018, [Online] http://cloudera.com/ (accessed on 15 January 2018)
Hortonworks, Discussion about Horton Platform working,[Online] http://hortonworks.com/hdp/ (accessed on 15 June 2018)
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
