A Study on Distributed Computing Framework: Hadoop, Spark and Storm
DOI:
https://doi.org/10.26438/ijcse/v6i3.269274Keywords:
Distributed framework, Big Data, Hadoop, Spark, Storm, distributed computingAbstract
The storage and management of information has always been a challenge for software engineering, new programing approaches had to be found, parallel processing and then distributed computing programing models were developed, and new programing frameworks were developed to assist software developers. This is where Hadoop framework, an open source implementation of MapReduce programing model, that also takes advantage of a distributed file system, takes its lead, but in the meantime, since its presentation, there were evolutions to the MapReduce and new programing models that were introduced by Spark and Storm frameworks, that show promising results.
References
[1] Vairaprakash Gurusamy, S.Kannan, K.Nandhini, “ The Real Time Big Data Processing Framework: Advantages and Limitations”, International Journal of Computer Science and Engineering (IJCSE), ISSN: 2347-2693, Volume 5, Issue 12, December 2017, OI: https://doi.org/10.26438/ijcse/v5i12.305312
[2] Yahoo! Hadoop Tutorial. https://developer.yahoo.com/hadoop/tutorial/. Accessed 20 Dec 2014.
[3] Aridhi S (2014) Frameworks for Distributed Computing Sabeur Aridhi.
[4] What is Hadoop. http://www-01.ibm.com/software/data/infosphere/hadoop/. Accessed 22 Dec 2014.
[5] Rosario R (2011) No Title. http://www.bytemining.com/2011/08/hadoop-fatigue- alternatives-to-hadoop/. Accessed 15 Dec 2014.
[6] Welcome to ApacheTM Hadoop®! http://hadoop.apache.org/. Accessed 20 Dec 2014.
[7] Apache Spark. https://spark.apache.org/. Accessed 26 Dec 2014.
[8] Xin R, Rosen J, Zaharia M (2013) Shark: SQL and rich analytics at scale.
[9] Zaharia M, Das T, Li H, et al. (2012) Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. Proc. 4th Edition. Apache Storm. https://storm.apache.org/. Accessed 27 Dec 2014.
[10] Xuhui Liu; Jizhong Han; Yunqin Zhong; Chengde Han; Xubin He, Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS, Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on , vol., no., pp.1,8, Aug. 31 2009-Sept. 4 2009.
[11] L. Jiang, B. Li, M. Song, THE optimization of HDFS based on small files, In 3rd IEEE International Conference on Broadband Network and Multimedia Technology (IC- BNMT2010), Beijing, 2010. pp. 912-915.
[12] G. Mackey, S. Sehrish, J. Wang, Improving metadata management for small files in HDFS, In 2009 IEEE International Conference on Cluster Computing and Workshops (CLUSTER'09), New Orleans,Sept, 2009, pp.1-4.
[13] Jiong Xie; Shu Yin; Xiaojun Ruan; Zhiyang Ding; Yun Tian; Majors, J.; Manzanares, A.; Xiao Qin, Improving MapReduce performance through data placement in heterogeneous Hadoop clusters, Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on , vol., no., pp.1,9, 19-23 April 2010
[14] hanh, T.D.; Mohan, S.; Eunmi Choi; SangBum Kim; Pilsung Kim, A Taxonomy and Survey on Distributed File Systems, Networked Computing and Advanced Information Management, 2008. NCM '08. Fourth International Conference on, vol.1, no., pp.144,149, 2-4 Sept. 2008.
[15] S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. In SOSP ’03: Pro- ceedings of the Nineteenth ACM Symposium on Operating Systems Principles, pages 29– 43, New York, NY, USA, 2003. ACM.
[16] J. M. Hellerstein, M. Stonebraker, and J. Hamilton. Architecture of a database system. Foundations and Trends in Databases, 1(2): 141–259,2007.
[17] Apache Storm vs. Apache Spark. http://www.zdatainc.com/2014/09/apache-storm-apache- spark/. Accessed 20 Dec 2014.
[18] Vairaprakash Gurusamy, S.Kannan, K.Nandhini, “Facility Location: A Theoretical Approach for Flood Relief”, International Journal of Computer Science and Engineering (IJCSE), ISSN: 2347-2693, Volume 5, Issue 11, November 2017, DOI: https://doi.org/10.26438/ijcse/v5i11.8489
[19] Storm vs. Spark Streaming: Side-by-side comparison. http://xinhstechblog.blogspot.pt/ 2014/06/storm-vs-spark-streaming-side-by-side.html. Accessed 20 Dec 2014.
[20] How to run Storm on Apache Mesos. https://mesosphere.com/docs/tutorials/run-storm-on- mesos/. Accessed 20 Dec 2014.
[21] Storm on YARN Install on HDP2 Cluster. http://hortonworks.com/kb/storm-on-yarn- install-on-hdp2-beta-cluster/. Accessed 20 Dec 2014
[22] Vairaprakash Gurusamy, K.Nandhini, “Ibis: The New Era for Distributed Computing”, International Journal of Engineering Sciences and Research Technology (IJESRT), ISSN: 2277-9655, Volume 7, Issue 1, DOI: 10.5281/zenodo.1135392
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
