Big Data Platform-A Review
Keywords:
Hadoop, HDFS, Name node, Data node, Map Reduce, Data locality, Job Tracker, Task TrackerAbstract
Hadoop is popular distributed system used for the analysis of large amount of data. Hadoop is based on distributed computing having HDFS (Hadoop Distributed File System) &Map Reduce programming paradigm. Hadoop is highly fault-tolerant due to its imitation of data transversely on multiple nodes and can be set out on low cost hardware. The file system –HDFS—written in JAVA and designed for heterogeneous hardware and software. Hadoop is very much appropriate for high volume of data & where data format is different like semi structured, unstructured. Hadoop also make available the high speed admittance to the data of the application which we want to use. Hadoop architecture is cluster based (cluster consists of racks), which is consist of nodes (data note, name node), physically separate to each other, in idyllic circumstances. In Hadoop a program known as map-reduce is used to collect data according to query. As Hadoop is used for massive amount of data therefore scheduling and way of containing data in Hadoop must be efficient for better presentation. With this feature of Hadoop the traditional system is replacing with Hadoop. The research objective is to study and explore various scheduling techniques, which are used to increase performance in Hadoop. This paper include the idea of working of Hadoop, its internal details and why Hadoop is better than the Traditional system.
References
Transl. J. Magn. Japan, [Digests 9th Annual Conf. Magnetics Japan, Vol. 2, pp. 740-741, August 1987 pp. 301, 1982].
Chris Eaton and Tom Deutsch, Understanding Big Data-Analytics for Enterprise Class Hadoop and Streaming Data.
Arun C. Murthy and Vinod Kumar Vavilapalli, Apache Hadoop YARN-Moving beyond MapReduce and Batch Processing with Apache Hadoop 2.
https://www.youtube.com/watch?v=DLutRT6K2rM
Figure 2. The flow of data in a simple MapReduce job pp.62 Chris Eaton and Tom Deutsch, Understanding Big Data- Analytics for Enterprise Class Hadoop and Streaming Data.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
