Analyzing Failures of a Semi-Structured Supercomputer Log File Efficiently by Using PIG on Hadoop
Keywords:
Big Data, Parallel Processing, Hadoop, MapReduce, Data Mining, Business Intelligence, PIG, Log file analysis, SupercomputerAbstract
Data sets used to fuel the recently popular concept of business intelligence are becoming increasingly large. Conventional database management software is no longer efficient enough however; parallel database management systems and massive data-scale processing systems like MapReduce indeed look promising. Although, MapReduce is a good option, it is difficult to work with, as the programmer would have to think at the mapper and reducer level. In this paper, we present a simple yet efficient way to mine useful information where a program can be written as a series of steps. We have queried a supercomputer log file using Apaches Hadoop and PIG, obtained results as to when and why the supercomputer had failed and compared these results to that of a traditional program.
References
. T. White, Hadoop: The Definitive Guide. Yahoo Press,2010.
. Chuck Lam,Pig:Hadoop in Action.
. J. Dean and S. Ghemawat, “Mapreduce: Simplified Data Processing on Large Clusters,” Comm. of the ACM,Vol. 51, no. 1, pp. 107–113, 2008.
. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins, “Pig latin: A Not-So-Foreign Language for Data Processing,” Proc. of the 2008 ACM SIGMOD international conferenceon Management of Data, 2008, pp. 1099–1110.
. Thomas Reidemeister, Mohammad Ahmad Munawar, Miao Jiang, Paul A.S.Ward, "Diagnosis of Recurrent Faults using Log Files," Proc. of the 2009 Conference of the Center for Advanced Studies on Collaborative Research,November 2009, pp. 12-23 .
. Apache. Hadoop: Open-source implementation of MapReduce. http://hadoop.apache.org.
. Apache. Pig: High-level data ow system for Hadoop. http://www.pig.apache.org
. Michael Cardosa, Chenyu Wang, Anshuman Nangia, Abhishek Chandra, Jon Weissman,"Exploring MapReduce efficiency with highly-distributed data" Proc. of the second international workshop on MapReduce and its applications",June 2011, pp. 27-34.
. H.-C. Yang, A. Dasdan, R.-L. Hsiao, and D. S. Parker, “Map-reducemerge: simplified relational data processing on large clusters,”proc. of the SIGMOD Conference, 2007, pp. 1029–1040.
. A. Gates, O. Natkovich, S. Chopra, P. Kamath, S. Narayanam, C. Olston, B. Reed, S. Srinivasan, and U. Srivastava., “Building a High-Level Dataflow System on Top of Map-Reduce: The Pig Experience.” Proc. of the VLDB Endowment, vol. 2,no. 2, 2009.
. A tutorial on pig: http://www.pig-tutorial.blogspot.in/
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
