The Real Time Big Data Processing Framework: Advantages and Limitations
DOI:
https://doi.org/10.26438/ijcse/v5i12.305312Keywords:
Big Data, Hadoop, HDFS, Spark, Storm, Flink, SamzaAbstract
Big data is a blanket term for the non-traditional strategies and technologies needed to gather, organize, process, and gather insights from large datasets. While the problem of working with data that exceeds the computing power or storage of a single computer is not new, the pervasiveness, scale, and value of this type of computing have greatly expanded in recent years. In this paper, we will take a look at one of the essential components of a big data system: processing frameworks. Processing frameworks compute over the data in the system, either by reading from non-volatile storage or as it is ingested into the system. Computing over data is the process of extracting information and insight from large quantities of individual data points.
References
A. Alexandrov, R. Bergmann, S. Ewen, J.-C. Freytag, F. Hueske, A. Heise, O. Kao, M. Leich, U. Leser, V. Markl, F. Naumann, M. Peters, A. Rheinlander, M. J. Sax, S. Schelter, M. Hoger, K. Tzoumas, and D. Warneke. The stratosphere platform for big data analytics. The VLDB Journal, 23(6):939-964, 2014.
K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The Hadoop Distributed File System. In IEEE MSST, 2010.
S. Aridhi and E. M. Nguifo. Big graph mining: Frameworks and techniques. Big Data Research, 6:1-10, 2016.
Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst. The hadoop approach to large-scale iterative data analysis. The VLDB Journal, 21(2):169-190, Apr. 2012.
P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas. Apache inkTM: Stream and batch processing in a single engine. IEEE Data Eng. Bull., 38(4):28-38, 2015.
J. Dean and S. Ghemawat. MapReduce: simpli_ed data processing on large clusters. Commun. ACM, 51(1):107-113, 2008.
D. Eadline. Hadoop 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem. Addison-Wesley Professional, 1st edition, 2015.
B. Elser and A. Montresor. An evaluation study of bigdata frameworks for graph processing. In IEEE International Conference on Big Data, pages 60-67, 2013.
A. Gandomi and M. Haider. Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2):137-144, 2015.
R. Li, H. Hu, H. Li, Y. Wu, and J. Yang. Mapreduce parallel programming model: A state-of-the-art survey. International Journal of Parallel Programming, pages 1-35, 2015.
X. Liu, N. Iftikhar, and X. Xie. Survey of real-time processing systems for big data. In Proceedings of the 18th International Database Engineering & Applications Symposium, pages 356-361. ACM, 2014.
D. Singh and C. K. Reddy. A survey on platforms for big data analytics. Journal of Big Data, 2(1):8, 2014.
M. Tatineni, X. Lu, D. Choi, A. Majumdar, and D. K. D. Panda. Experiences and bene_ts of running rdma hadoop and spark on sdsc comet. In Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale, XSEDE16, pages 23:1-23:5, New York, NY, USA, 2016. ACM.
R. S. Xin, J. E. Gonzalez, M. J. Franklin, and I. Stoica. Graphx: A resilient distributed graph system on spark. In First International Workshop on Graph Data Management Experiences and Systems, GRADES '13, pages 2:1-2:6, New York, NY, USA, 2013. ACM.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
