A Modified Approach for Missing Values in Data Mining Based on Rough Set Theory, Divided –and-Conquer, Closest Fit Approach Idea

Authors

  • Sarkar A M.E. Student, CSE/IT, University Institute of Technology, The University of Burdwan. Burdwan, India.
  • Ghosh K Assistant Professor, CSE/IT, University Institute of Technology, The University of Burdwan, Burdwan, India.

Keywords:

component, Data mining, missing data, Data preprocessing, Statistical methods, Prediction methods, Rough Set Theory, Serially

Abstract

Missing data plays a key role in practical fields. How to remove this gap is the main objective of data preprocessing step in data-mining. Many methods such as Statistical and Prediction approaches are generally used for missing data analysis, but unfortunately both approaches have some disadvantages and applicable for serial missing values in column. This paper tries to remove these gaps which are resulting from the two mentioned methods. The proposed algorithm tries to merge up two previously mentioned methods. This modified approach utilizes the potential knowledge and laws suggested by the data in Information System, and some basic mathematical concepts and some concepts from Rough Set Theory. Experimental results show that the proposed algorithm provides better result than the above mentioned two methods.

References

Zaimei Zhang, Renefa Li, Zhongsheng Li,Haiyan Zhang,Gungaxue Yue. “An incomplete Data Analysis approach Based on Rough St Theory and Divide-and-Conquer Idea”, Fourth Int’ Conf On Fuzzy Systems and Knowledge Discovery(FSKD 2007).

Sanjay Gaur and M.S. Dulawat “A Closest Fit Approach to Missing Attribute Values in Data Mining”, International Journal of Advances in Sciences and Technology Vol.2,No.4,2011 .

Weihua Zhou,Wei Zhang,Yunique Fu.”An Incomplete data analysis approach using rough set theory”, Intelligent Mechatronics and animation.2004,pp.332-338.

Stenfanowski J,Tsoukias A. “On the Extension of Rough Sets Under Incomplete Information”. S Zhong, A Skorown, S Ohsuga (Eds).In: Proc. Of the 7th Int’l Workshop on New Directions in Rough Sets, Data Mining, and Granular Soft Computing.Berlin:Springer-verlag,1999,pp.73-81

Jerzy W,Grzymal-Busse,Ming Hu. “A comparison of several approaches to missing attribute values in data mining”. In: Proc of the 2nd Int’ Conf On Rough Sets and Currents Trends in Computing.Berlin:Springer-Verlag,2000,pp.378-385.

Cios K J.Kurgan L. A. “Trends in data mining and Knowledge

Discovery”. In: Knowledge discovery in advanced information systems,Pal,N.R., Jain,L.C., Teodereresku N.eds.Spinger,2002.

Kryszkiewiez M. “Rough set approach to incomplete information Systems”. Information Sciences,1998, 112,39-49

Pawlak Z. “Rough Sets”. International Journal of Computer and Information Sciences,1982,11(5),pp.341-356.

Symth, P., “ Data mining at the interface of computer Science and Statistics”, Data mining for Scientific and engineering applications, Department of Information and Computer Science, University of California,CA,92697-3425,Chapter 1,pp.1-20,2001.

Zhang,S., Zhang,C., and Young,Q., “Data Preparation for data mining”. Applied Artificial Intelligence,Vol.17,pp.375-381,2003.

Clark, P., and Niblett ,T., “The CN2 induction algorithm”, Machine Learning, Vol. 3,pp.261-283,1983.

Konoenko ,I., Bratko, I, and Roskar,E., “ Experiments in automatic learning of medical diagnostic rules”, Technical Report, Jozef Stefan Institute,LIjubal-jans,Yugoslavia,1984.

http://www.ics.uci.edu/-mlearn/MLRespository.html.

Downloads

Published

2015-02-28

How to Cite

[1]
A. Sarkar and K. Ghosh, “A Modified Approach for Missing Values in Data Mining Based on Rough Set Theory, Divided –and-Conquer, Closest Fit Approach Idea”, Int. J. Comp. Sci. Eng., vol. 3, no. 1, pp. 51–58, Feb. 2015.