Feature Selection on High Dimensional Big Data of Gens Expression Using Filter Based Feature Selection Methods
Keywords:
Feature selection, Lung cancer, Gens expression, Classifier, SubsetAbstract
Feature selection approach solves the dimensionality problem by removing irrelevant and redundant features. Recently, big data is widely available in information systems and data mining has pulled in a major thoughtfulness regarding analysts to transform such information into helpful learning. This implies the presence of low quality, questionable, excess and uproarious information which contrarily influence the way toward watching learning and helpful example. As follows, researchers require related big data utilizing feature selection methods. The process of feature selection is identifying the most relevant attributes and removing the redundant and irrelevant attributes. In this paper, find out the result of different feature selection methods based on a recognized dataset (i.e., gens expression dataset) and classification algorithms were used to evaluate the performance of the algorithms. In this study revealed that feature selection methods are capable to improve the performance of learning algorithms. Still, there are no any single filter based feature selection method is the best. Taken as a whole, Classifier AttEval, Correlation AttributeEval, Principal Components, and ReliefAttEval methods performed better results than the others.
References
[1] A. Tsymbal and S. Puuronen. (2010). Local feature selection with dynamic integration of classifiers. Foundations of Intelligent Systems, 363–375.
[2] Ashraf, M., Chetty, G., & Tran, D. (2013). Feature Selection Techniques on Thyroid , Hepatitis , and Breast Cancer Datasets, 3(March), 1–8.
[3] Bhattacharjee, a, Richards, W. G., Staunton, J., Li, C., Monti, S., Vasa, P., … Meyerson, M. (2001). Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci U S A, 98(24), 13790–5.
[4] Dimitoglou, G., Adams, J. a, & Jim, C. M. (2012). Comparison of the C4.5 and a Naive Bayes Classifier for the Prediction of Lung Cancer Survivability. Journal of Neural Computing, 4(8), 1–9.
[5] Hall, M. (1999). Correlation-based Feature Selection for Machine Learning. Methodology, 21i195-i20(April), 1–5.
[6] Holte, R. C. (1993). Very Simple Classi fi cation Rules Perform Well on Most Commonly Used Datasets. Machine Learning, 11(1), 63–91.
[7] http://www.cs.waikato.ac.nz/ml/weka. (n.d.). WEKA: Weka 3: Data Mining Software in Java.
[8] Huang, S. H., Wulsin, L. R., Li, H., & Guo, J. (2009). Dimensionality reduction for knowledge discovery in medical claims database: Application to antidepressant medication utilization study. Computer Methods and Programs in Biomedicine, 93(2), 115–123
[9] Inza, I., Larrañaga, P., Blanco, R., & Cerrolaza, A. J. (2004). Filter versus wrapper gene selection approaches in DNA microarray domains. Artificial Intelligence in Medicine, 31(2), 91–103.
[10] Jolliffe, I. T. (2002). Principal Component Analysis, Second Edition. Encyclopedia of Statistics in Behavioral Science, 30(3), 487.
[11] Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1–2), 273–324.
[12] Leach, M. (2012). Parallelising feature selection algorithms. University of Manchester.
[13] Lee, I.-H., Lushington, G. H., & Visvanathan, M. (2011). A filter-based feature selection approach for identifying potential biomarkers for lung cancer. Journal of Clinical Bioinformatics, 1(1), 11.
[14] Liu, H., Setiono, R., Science, C., & Ridge, K. (1995). Chi2: Feature Selection, 388–391.
[15] Novaković, J., Strbac, P., & Bulatović, D. (2011). Toward optimal feature selection using ranking methods and classification algorithms. Yugoslav Journal of Operations Research, 21(1), 119–135.
[16] Patil, T. R. (2013). Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification. International Journal Of Computer Science And Applications, ISSN: 0974-1011, 6(2), 256–261.
[17] Roslina, A. H., & Noraziah, A. (2010). Prediction of hepatitis prognosis using support vector machines and wrapper method. Proceedings - 2010 7th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2010, 5(Fskd), 2209–2211.
[18] Sathyadevi, G. (2011). Application of CART algorithm in hepatitis disease diagnosis. International Conference on Recent Trends in Information Technology, ICRTIT 2011, 1283–1287.
[19] Witten, I. H., Frank, E., & Hall, M. a. (2011). Data Mining: Practical Machine Learning Tools and Techniques (Google eBook). Complementary literature None.
[20] Yasin, H. (2011). Hepatitis-C Classification using Data Mining Techniques, 24(3), 1–6.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
