The Real Time Big Data Processing Framework: Advantages and Limitations

Authors

  • Gurusamy V Department of Computer Applications, School of IT, Madurai Kamaraj University, Madurai, India
  • S Kannan Department of Computer Applications, School of IT, Madurai Kamaraj University, Madurai, India
  • K Nandhini Technical Support Engineer, Concentrix India Pvt Ltd, Chennai, India

DOI:

https://doi.org/10.26438/ijcse/v5i12.305312

Keywords:

Big Data, Hadoop, HDFS, Spark, Storm, Flink, Samza

Abstract

Big data is a blanket term for the non-traditional strategies and technologies needed to gather, organize, process, and gather insights from large datasets. While the problem of working with data that exceeds the computing power or storage of a single computer is not new, the pervasiveness, scale, and value of this type of computing have greatly expanded in recent years. In this paper, we will take a look at one of the essential components of a big data system: processing frameworks. Processing frameworks compute over the data in the system, either by reading from non-volatile storage or as it is ingested into the system. Computing over data is the process of extracting information and insight from large quantities of individual data points.

References

A. Alexandrov, R. Bergmann, S. Ewen, J.-C. Freytag, F. Hueske, A. Heise, O. Kao, M. Leich, U. Leser, V. Markl, F. Naumann, M. Peters, A. Rheinlander, M. J. Sax, S. Schelter, M. Hoger, K. Tzoumas, and D. Warneke. The stratosphere platform for big data analytics. The VLDB Journal, 23(6):939-964, 2014.

K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The Hadoop Distributed File System. In IEEE MSST, 2010.

S. Aridhi and E. M. Nguifo. Big graph mining: Frameworks and techniques. Big Data Research, 6:1-10, 2016.

Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst. The hadoop approach to large-scale iterative data analysis. The VLDB Journal, 21(2):169-190, Apr. 2012.

P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas. Apache inkTM: Stream and batch processing in a single engine. IEEE Data Eng. Bull., 38(4):28-38, 2015.

J. Dean and S. Ghemawat. MapReduce: simpli_ed data processing on large clusters. Commun. ACM, 51(1):107-113, 2008.

D. Eadline. Hadoop 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem. Addison-Wesley Professional, 1st edition, 2015.

B. Elser and A. Montresor. An evaluation study of bigdata frameworks for graph processing. In IEEE International Conference on Big Data, pages 60-67, 2013.

A. Gandomi and M. Haider. Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2):137-144, 2015.

R. Li, H. Hu, H. Li, Y. Wu, and J. Yang. Mapreduce parallel programming model: A state-of-the-art survey. International Journal of Parallel Programming, pages 1-35, 2015.

X. Liu, N. Iftikhar, and X. Xie. Survey of real-time processing systems for big data. In Proceedings of the 18th International Database Engineering & Applications Symposium, pages 356-361. ACM, 2014.

D. Singh and C. K. Reddy. A survey on platforms for big data analytics. Journal of Big Data, 2(1):8, 2014.

M. Tatineni, X. Lu, D. Choi, A. Majumdar, and D. K. D. Panda. Experiences and bene_ts of running rdma hadoop and spark on sdsc comet. In Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale, XSEDE16, pages 23:1-23:5, New York, NY, USA, 2016. ACM.

R. S. Xin, J. E. Gonzalez, M. J. Franklin, and I. Stoica. Graphx: A resilient distributed graph system on spark. In First International Workshop on Graph Data Management Experiences and Systems, GRADES '13, pages 2:1-2:6, New York, NY, USA, 2013. ACM.

Downloads

Published

2025-11-12
CITATION
DOI: 10.26438/ijcse/v5i12.305312
Published: 2025-11-12

How to Cite

[1]
V. Gurusamy, S. Kannan, and K. Nandhini, “The Real Time Big Data Processing Framework: Advantages and Limitations”, Int. J. Comp. Sci. Eng., vol. 5, no. 12, pp. 305–312, Nov. 2025.