Improvement of Time Complexity on External Sorting using Refined Approach and Data Preprocessing
Keywords:
data preprocessing, external sorting, Data cleaning, passes, Inputs / Outputs, and runsAbstract
Generally, huge data of any organization possess data redundancy, noise and data inconsistency. To eliminate, Data preprocessing should be performed on raw data, then sorting technique is applied on it. Data preprocessing includes many methods such as data cleaning, data integration, data transformation and data reduction. Depending on the complexity of given data, these methods are taken and applied on raw data in order to produce quality of data. Then, external sorting is applied. The proposed external sorting now takes the number of passes less than actual passes log B (N/M) + 1 for the traditional B – way external merge sorting. Also, the number of Input / Outputs of proposed method is less than 2*N* (log B (N/M) + 1) of Input / Outputs than traditional method, and also proposed method consume least number of runs compared to actual basic external sorting.
References
Mark Allen Weiss, “Data Structures and Algorithm Analysis in C++”, Chapter7, Fourth Edition, Pearson, Florida International University, ISBN-13: 978-0-13-284737-7, ISBN-10: 0-13-284737-X.
Mark Allen Weiss, “Data Structures and Algorithm Analysis in Java “,Chapter7,Third Edition, Pearson, Florida International University ISBN-13: 978-0-13-257627-7,ISBN-10: 0-13-257627-9.
Alfred V. Aho, John E. HopCroft and Jelfrey D. Ullman, “Data Structures and Algorithms”, Chapter- Sorting,Addison –Wesley, 1983.
Micheline Kamber and Jiawei Han,”Data Preprocessing, Data Mining Principles and Techniques”.
Margaret H Dunham, “Data Mining Introductory and Advanced Topics”, Pearson Education, 2e, 2006.
Sam Anahory and Dennis Murry,”Data warehousing in the Real World”,Pearson Education,2003.
D. E. Knuth (1985), Sorting and Searching, The Art of Computer Programming, Vol. 3, Addison –Wesley, Reading, MA, (1985).
] Alok Aggarwal and Jeffrey Scott Vitter, Input and Output Complexity of Sorting and related problems, Algorithms and Data Structure, AV88.pdf.
Leu, , Fang-Cheng; Tsai, Yin-Te; Tang, Chuan Yi,”An efficient External Sorting Algorithm”, pp.159 – 163, Information Processing Letters 75 2000.
Ian H. Witten, Eibe Frank, Morgan Kaufmann,”Data Mining: Practical Machine Learning Tools and Techniques”, Second Edition (Morgan Kaufmann Series in Data Management Systems), 2005.
Zhi – Hua Zhou, Dept. of CSE, Nanjing University,”Introduction to Data Mining”, part3: Data Preprocessing, Pt03.pdf, Spring 2012.
Chapter 3. Data Preprocessing, www.cs.uiuc.edu /homes/hanj/cs412/bk3.../ 03Preprocessing.ppt.
Chapter 2. Data Preprocessing, ww.cs.gsu.edu/~cscyqz/ courses/dm/slides/ch02.ppt.
R&G Chapter 13:External Sorting, inst.eecs.berkeley.edu /~cs186/fa06/lecs/05Sorting.ppt.
Chapter11:External Sorting, www.cs.rutgers.edu /~muthu/lec9-04.ppt.
DATAMINING/IT0467, http://www.srmuniv.ac.in/sites/ default/files/ files/Data%20Mining.pdf.
Chiara Rebso, KDD- LAB, ISTI – CNR, Pisa, Italy ,http://www.techrepublic.com/resource-library
/whitepapers/an-unique-data-mining-task-for-sorting-data-preprocessing-for-efficient-external-sorting/
APPLICATION OF A DATA MINING TASK CALLEDDATA PREPROCESSING ON THE INPUT DATA AND EFFICIENT EXTERNAL SORTING USING REFINEMENT OF EXISTING ALGORITHM, http://esatjournals.net/ijret/2012v01/i03/IJRET
A Survey on Improved Time Complexities for the certain data structures using data preprocessing and refinement of existing algorithms used over them, http://ijarcet.org/wp-content/uploads/IJARCET-VOL-5-ISSUE-4-1147-1154.pdf
Performance Analysis of Data Reduction Algorithms using Attribute Selection in NSL-KDD Dataset, http://ijesat.org/Volumes/2014_Vol_04_Iss_02/IJESAT_2014_04_02_16.pdf.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
