Comparative Study of Big Data Technologies and Frameworks

Authors

  • Tripathi M Research Scholar, Computer Science &Engineering, Kamla Nehru Institute of Technology, Sultanpur, Uttar Pradesh, India
  • Agarwal AK Assistant Professor, Computer Science &Engineering, Kamla Nehru Institute of Technology, Sultanpur, Uttar Pradesh, India

DOI:

https://doi.org/10.26438/ijcse/v6i8.488495

Keywords:

Big Data, Hadoop, MapReduce, HBase, Sqoop, Flume;, Apache Spark, Cloudera, Hortonworks

Abstract

The organization's hunger for data insights and the adaptation of the World Wide Web has increased exponentially the generation and collection speed of data. There is a challenge to capture, store and analyze this large set of unstructured data, which have taken the shape of Big Data. In this paper, the definition of Big Data is introduced from different aspects to comprehend its concept. The architecture of Big Data is analyzed to study the processing mechanism of Big Data. The various Big Data technologies like Hadoop, HBase, Map Reduce, Pig, Hive, Sqoop, and Flume are studied and compare based on features supported by them. A comprehensive study of frameworks like Apache Spark, Cloudera, and Hortonworks used for execution of Big Data technologies is done by highlighting their important features. This paper also represents how data related to fields like the Stock market, Agriculture, Medical Health Records, and Internet traffic is stored, processed and analyzed using Big Data technologies and frameworks

References

3pillarglobal.com, How to Analyze Big Data with Hadoop Technologies [Online], Available: http://www.3pillarglobal.com/ and http://www.3pillarglobal.com/insights/analyze-big-datahadoop-technologies (accessed on 11 April 2018)

Er. Rupinder Kaur, Raghu Garg, Dr Himanshu Aggarwal, Big Data Analytics Framework to Identify Crop Disease and Recommendation a Solution, IEEE, International Conference on Inventive Computation Technologies (ICICT), volume 2, 2016.

Haritha Chennamsetty, Suresh Chalasani, Derek Riley, Predictive Analytics on Electronic Health Records (EHRs) using Hadoop and Hive, IEEE, International Conference on Electrical, Computer and Communication Technologies (ICECCT), 2015.

Abdeltawab M. Hendawi, Fatemah Alali, Xiaoyu Wang, Yunfei Guan, Tianshu Zhou, Xiao Liu, Nada Basit, John A. Stankovic, Hobbits: Hadoop and Hive Based Internet Traffic Analysis, IEEE, International Conference on Big Data (Big Data), 2016.

J. Gantz and D. Reinsel, Extracting value from chaos, in Proc. IDC iView, pp. 1–12, 2011.

J. Manyika et al, Big Data: The Next Frontier for Innovation Competition, and Productivity, San Francisco, CA, USA: McKinsey Global Institute, pp. 1–37, 2011.

M. Cooper and P. Mell (2012), Tackling Big Data [Online], Available: http://csrc.nist.gov/groups/SMA/forum/documents/june2012present ations/fcsm_june2012_cooper_mell.pdf (accessed on 13 May 2018)

G. Blackett (2013), Analytics Network-O.R. Analytics [Online], Available: http://www.theorsociety.com/Pages/SpecialInterest/AnalyticsNetwo rk_analytics.aspx (accessed on 13 May 2018)

Palanisamy, B. Singh, & Liu, “cost-effective resource provisioning for MapReduce in a cloud,” IEEE Transactions on Parallel and Distributed Systems, pp: 1265-1279, 2015.

Mike Frampton, Mastering Apache Spark (ed.) 2015, Packet publication ltd., U.K.

Cloudera, Cloudera Platform 2018, [Online] http://cloudera.com/ (accessed on 15 January 2018)

Hortonworks, Discussion about Horton Platform working,[Online] http://hortonworks.com/hdp/ (accessed on 15 June 2018)

Downloads

Published

2025-11-15
CITATION
DOI: 10.26438/ijcse/v6i8.488495
Published: 2025-11-15

How to Cite

[1]
M. Tripathi and A. K. Agarwal, “Comparative Study of Big Data Technologies and Frameworks”, Int. J. Comp. Sci. Eng., vol. 6, no. 8, pp. 488–495, Nov. 2025.

Issue

Section

Research Article