A Modified Approach for Missing Values in Data Mining Based on Rough Set Theory, Divided –and-Conquer, Closest Fit Approach Idea
Keywords:
component, Data mining, missing data, Data preprocessing, Statistical methods, Prediction methods, Rough Set Theory, SeriallyAbstract
Missing data plays a key role in practical fields. How to remove this gap is the main objective of data preprocessing step in data-mining. Many methods such as Statistical and Prediction approaches are generally used for missing data analysis, but unfortunately both approaches have some disadvantages and applicable for serial missing values in column. This paper tries to remove these gaps which are resulting from the two mentioned methods. The proposed algorithm tries to merge up two previously mentioned methods. This modified approach utilizes the potential knowledge and laws suggested by the data in Information System, and some basic mathematical concepts and some concepts from Rough Set Theory. Experimental results show that the proposed algorithm provides better result than the above mentioned two methods.
References
Zaimei Zhang, Renefa Li, Zhongsheng Li,Haiyan Zhang,Gungaxue Yue. “An incomplete Data Analysis approach Based on Rough St Theory and Divide-and-Conquer Idea”, Fourth Int’ Conf On Fuzzy Systems and Knowledge Discovery(FSKD 2007).
Sanjay Gaur and M.S. Dulawat “A Closest Fit Approach to Missing Attribute Values in Data Mining”, International Journal of Advances in Sciences and Technology Vol.2,No.4,2011 .
Weihua Zhou,Wei Zhang,Yunique Fu.”An Incomplete data analysis approach using rough set theory”, Intelligent Mechatronics and animation.2004,pp.332-338.
Stenfanowski J,Tsoukias A. “On the Extension of Rough Sets Under Incomplete Information”. S Zhong, A Skorown, S Ohsuga (Eds).In: Proc. Of the 7th Int’l Workshop on New Directions in Rough Sets, Data Mining, and Granular Soft Computing.Berlin:Springer-verlag,1999,pp.73-81
Jerzy W,Grzymal-Busse,Ming Hu. “A comparison of several approaches to missing attribute values in data mining”. In: Proc of the 2nd Int’ Conf On Rough Sets and Currents Trends in Computing.Berlin:Springer-Verlag,2000,pp.378-385.
Cios K J.Kurgan L. A. “Trends in data mining and Knowledge
Discovery”. In: Knowledge discovery in advanced information systems,Pal,N.R., Jain,L.C., Teodereresku N.eds.Spinger,2002.
Kryszkiewiez M. “Rough set approach to incomplete information Systems”. Information Sciences,1998, 112,39-49
Pawlak Z. “Rough Sets”. International Journal of Computer and Information Sciences,1982,11(5),pp.341-356.
Symth, P., “ Data mining at the interface of computer Science and Statistics”, Data mining for Scientific and engineering applications, Department of Information and Computer Science, University of California,CA,92697-3425,Chapter 1,pp.1-20,2001.
Zhang,S., Zhang,C., and Young,Q., “Data Preparation for data mining”. Applied Artificial Intelligence,Vol.17,pp.375-381,2003.
Clark, P., and Niblett ,T., “The CN2 induction algorithm”, Machine Learning, Vol. 3,pp.261-283,1983.
Konoenko ,I., Bratko, I, and Roskar,E., “ Experiments in automatic learning of medical diagnostic rules”, Technical Report, Jozef Stefan Institute,LIjubal-jans,Yugoslavia,1984.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
