A Study on Distributed Computing Framework: Hadoop, Spark and Storm

Authors

  • Gurusamy V Department of Computer Applications, School of IT, Madurai Kamaraj University, Madurai, India
  • S Kannan Department of Computer Applications, School of IT, Madurai Kamaraj University, Madurai, India
  • K Nandhini Technical Support Engineer, Concentrix India Pvt Ltd, Chennai, India

DOI:

https://doi.org/10.26438/ijcse/v6i3.269274

Keywords:

Distributed framework, Big Data, Hadoop, Spark, Storm, distributed computing

Abstract

The storage and management of information has always been a challenge for software engineering, new programing approaches had to be found, parallel processing and then distributed computing programing models were developed, and new programing frameworks were developed to assist software developers. This is where Hadoop framework, an open source implementation of MapReduce programing model, that also takes advantage of a distributed file system, takes its lead, but in the meantime, since its presentation, there were evolutions to the MapReduce and new programing models that were introduced by Spark and Storm frameworks, that show promising results.

References

[1] Vairaprakash Gurusamy, S.Kannan, K.Nandhini, “ The Real Time Big Data Processing Framework: Advantages and Limitations”, International Journal of Computer Science and Engineering (IJCSE), ISSN: 2347-2693, Volume 5, Issue 12, December 2017, OI: https://doi.org/10.26438/ijcse/v5i12.305312

[2] Yahoo! Hadoop Tutorial. https://developer.yahoo.com/hadoop/tutorial/. Accessed 20 Dec 2014.

[3] Aridhi S (2014) Frameworks for Distributed Computing Sabeur Aridhi.

[4] What is Hadoop. http://www-01.ibm.com/software/data/infosphere/hadoop/. Accessed 22 Dec 2014.

[5] Rosario R (2011) No Title. http://www.bytemining.com/2011/08/hadoop-fatigue- alternatives-to-hadoop/. Accessed 15 Dec 2014.

[6] Welcome to ApacheTM Hadoop®! http://hadoop.apache.org/. Accessed 20 Dec 2014.

[7] Apache Spark. https://spark.apache.org/. Accessed 26 Dec 2014.

[8] Xin R, Rosen J, Zaharia M (2013) Shark: SQL and rich analytics at scale.

[9] Zaharia M, Das T, Li H, et al. (2012) Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. Proc. 4th Edition. Apache Storm. https://storm.apache.org/. Accessed 27 Dec 2014.

[10] Xuhui Liu; Jizhong Han; Yunqin Zhong; Chengde Han; Xubin He, Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS, Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on , vol., no., pp.1,8, Aug. 31 2009-Sept. 4 2009.

[11] L. Jiang, B. Li, M. Song, THE optimization of HDFS based on small files, In 3rd IEEE International Conference on Broadband Network and Multimedia Technology (IC- BNMT2010), Beijing, 2010. pp. 912-915.

[12] G. Mackey, S. Sehrish, J. Wang, Improving metadata management for small files in HDFS, In 2009 IEEE International Conference on Cluster Computing and Workshops (CLUSTER'09), New Orleans,Sept, 2009, pp.1-4.

[13] Jiong Xie; Shu Yin; Xiaojun Ruan; Zhiyang Ding; Yun Tian; Majors, J.; Manzanares, A.; Xiao Qin, Improving MapReduce performance through data placement in heterogeneous Hadoop clusters, Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on , vol., no., pp.1,9, 19-23 April 2010

[14] hanh, T.D.; Mohan, S.; Eunmi Choi; SangBum Kim; Pilsung Kim, A Taxonomy and Survey on Distributed File Systems, Networked Computing and Advanced Information Management, 2008. NCM '08. Fourth International Conference on, vol.1, no., pp.144,149, 2-4 Sept. 2008.

[15] S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. In SOSP ’03: Pro- ceedings of the Nineteenth ACM Symposium on Operating Systems Principles, pages 29– 43, New York, NY, USA, 2003. ACM.

[16] J. M. Hellerstein, M. Stonebraker, and J. Hamilton. Architecture of a database system. Foundations and Trends in Databases, 1(2): 141–259,2007.

[17] Apache Storm vs. Apache Spark. http://www.zdatainc.com/2014/09/apache-storm-apache- spark/. Accessed 20 Dec 2014.

[18] Vairaprakash Gurusamy, S.Kannan, K.Nandhini, “Facility Location: A Theoretical Approach for Flood Relief”, International Journal of Computer Science and Engineering (IJCSE), ISSN: 2347-2693, Volume 5, Issue 11, November 2017, DOI: https://doi.org/10.26438/ijcse/v5i11.8489

[19] Storm vs. Spark Streaming: Side-by-side comparison. http://xinhstechblog.blogspot.pt/ 2014/06/storm-vs-spark-streaming-side-by-side.html. Accessed 20 Dec 2014.

[20] How to run Storm on Apache Mesos. https://mesosphere.com/docs/tutorials/run-storm-on- mesos/. Accessed 20 Dec 2014.

[21] Storm on YARN Install on HDP2 Cluster. http://hortonworks.com/kb/storm-on-yarn- install-on-hdp2-beta-cluster/. Accessed 20 Dec 2014

[22] Vairaprakash Gurusamy, K.Nandhini, “Ibis: The New Era for Distributed Computing”, International Journal of Engineering Sciences and Research Technology (IJESRT), ISSN: 2277-9655, Volume 7, Issue 1, DOI: 10.5281/zenodo.1135392

Downloads

Published

2025-11-12
CITATION
DOI: 10.26438/ijcse/v6i3.269274
Published: 2025-11-12

How to Cite

[1]
V. Gurusamy, S. Kannan, and K. Nandhini, “A Study on Distributed Computing Framework: Hadoop, Spark and Storm”, Int. J. Comp. Sci. Eng., vol. 6, no. 3, pp. 269–274, Nov. 2025.

Issue

Section

Review Article