Analysis of Different Classifiers’ Performance After Applying Three Different Feature Selection Methods
Keywords:
Data Mining (DM), Feature Selection (FS), Rough Set, Degree of Dependency, Decision Tree (J48 algorithm), Naive Bayes Algorithm (NB), K-Nearest Neighbor Algorithm (KNN), Classification, Statistical AnalysisAbstract
Feature selection (FS) is an important aspect of data mining. Now a days availability of information with hundreds of variables leads to high dimensional, irrelevant and redundant data. Thus FS techniques must be applied on the datasets before classification or rule generation. It basically aims at reducing the number of attributes by removing irrelevant or redundant ones, while trying to reduce computation time and improve performance of classifiers. In this paper three different FS methods are used, Correlation Based, Information Gain Based and Rough set Based FS method. A statistical analysis of three different classifier's performance is also done in order to provide a detailed view.
References
[1] Imran Fareed Nizami, Muhammad Majid, Hammad Afzal and Khawar Khurshi, “Impact of Feature Selection Algorithms on Blind Image Quality Assessment”, Arabian Journal for Science and Engineering, pp 1–14, August 2017.
[2] Abdullah S. Ghareb, Abdul Razak Hamdan and Azuraliza Abu Bakar, “Integrating Noun-Based Feature Ranking and Selection Methods with Arabic Text Associative Classification Approach”, Arabian Journal for Science and Engineering, Vol.39, Issue.11, pp 7807–7822, November 2014.
[3] Z. Pawlak, Rough sets, International Journal of Computer and Information Sciences, 11, 341-356, 1982
[4] Javad Rahimipour Anaraki, Kerman, Iran, Mahdi Eftekhari, “Rough Set Based Feature Selection: A Review”, 5th Conference on Information and Knowledge Technology, IEEE, 2013.
[5] G. K. Gupta, “Introduction to Data Mining with Case Studies”, Prentice Hall of India New Delhi, 2006.
[6] P-N. Tan, M. Steinbach, V. Kumar, “Introduction to Data Mining”, Addison Wesley Publishing, 2006.
[7] O.Maimon and L.Rokach, “Data Mining and Knowledge Discovery”, Springer Science and Business Media, 2005.
[8] X. Niuniu and L. Yuxun, “Review of Decision Trees”, IEEE, 2010.
[9] Payam Emami Khoonsari and AhmadReza Motie, “A Comparison of Efficiency and Robustness of ID3 and C4.5 Algorithms Using Dynamic Test and Training Data Sets”, International Journal of Machine Learning and Computing, Vol.2, Issue.5, October 2012.
[10] V. Garcia, C. Debreuve, “Fast k Nearest Neighbor Search using GPU”, IEEE, 2008.
[11] A. Ashari I. Paryudi and A Min Tjoa, “Performance Comparison between Naïve Bayes Decision Tree and k-Nearest Neighbor in Searching Alternative Design in an Energy Simulation Tool”, International Journal of Advanced Computer Science and Applications, Vol.4, Issue. 11, 2013.
[12] Dougherty, J., R. Kohavi and M. Sahami, “Supervised and unsupervised discretization of continuous features”, Proceeding of the 12th International Conference on Machine Learning, 1995.
[13] https:// archive.ics.uci.edu/ ml/ datasets/ Diabetic + Retinopathy + Debrecen + Data + Set
[14] https:// archive.ics.uci.edu/ ml/ datasets/ EEG + Eye + State
[15] https:// archive.ics.uci.edu/ ml/ datasets/ cardiotocography
[16] https://archive.ics.uci.edu/ ml/ datasets/ Thoracic + Surgery + Data
[17] PIDD Dataset, https:// archive.ics.uci.edu/ ml/ datasets/ pima + indians + diabetes
[18] https:// archive.ics.uci.edu/ ml/ datasets/ ILPD + (Indian + Liver + Patient + Dataset)
[19] https:// archive.ics.uci.edu/ ml/ datasets/ breast + cancer + wisconsin + (original)
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
