Feature Selection on High Dimensional Big Data of Gens Expression Using Filter Based Feature Selection Methods

Authors

  • Shrivas AK Department of Computer Science, Mahant Laxmi Narayan Das College, Raipur, India
  • Chandrakar PK Department of Information and Technology, Dr. C. V. Raman University, Bilaspur, India

Keywords:

Feature selection, Lung cancer, Gens expression, Classifier, Subset

Abstract

Feature selection approach solves the dimensionality problem by removing irrelevant and redundant features. Recently, big data is widely available in information systems and data mining has pulled in a major thoughtfulness regarding analysts to transform such information into helpful learning. This implies the presence of low quality, questionable, excess and uproarious information which contrarily influence the way toward watching learning and helpful example. As follows, researchers require related big data utilizing feature selection methods. The process of feature selection is identifying the most relevant attributes and removing the redundant and irrelevant attributes. In this paper, find out the result of different feature selection methods based on a recognized dataset (i.e., gens expression dataset) and classification algorithms were used to evaluate the performance of the algorithms. In this study revealed that feature selection methods are capable to improve the performance of learning algorithms. Still, there are no any single filter based feature selection method is the best. Taken as a whole, Classifier AttEval, Correlation AttributeEval, Principal Components, and ReliefAttEval methods performed better results than the others.

References

[1] A. Tsymbal and S. Puuronen. (2010). Local feature selection with dynamic integration of classifiers. Foundations of Intelligent Systems, 363–375.

[2] Ashraf, M., Chetty, G., & Tran, D. (2013). Feature Selection Techniques on Thyroid , Hepatitis , and Breast Cancer Datasets, 3(March), 1–8.

[3] Bhattacharjee, a, Richards, W. G., Staunton, J., Li, C., Monti, S., Vasa, P., … Meyerson, M. (2001). Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci U S A, 98(24), 13790–5.

[4] Dimitoglou, G., Adams, J. a, & Jim, C. M. (2012). Comparison of the C4.5 and a Naive Bayes Classifier for the Prediction of Lung Cancer Survivability. Journal of Neural Computing, 4(8), 1–9.

[5] Hall, M. (1999). Correlation-based Feature Selection for Machine Learning. Methodology, 21i195-i20(April), 1–5.

[6] Holte, R. C. (1993). Very Simple Classi fi cation Rules Perform Well on Most Commonly Used Datasets. Machine Learning, 11(1), 63–91.

[7] http://www.cs.waikato.ac.nz/ml/weka. (n.d.). WEKA: Weka 3: Data Mining Software in Java.

[8] Huang, S. H., Wulsin, L. R., Li, H., & Guo, J. (2009). Dimensionality reduction for knowledge discovery in medical claims database: Application to antidepressant medication utilization study. Computer Methods and Programs in Biomedicine, 93(2), 115–123

[9] Inza, I., Larrañaga, P., Blanco, R., & Cerrolaza, A. J. (2004). Filter versus wrapper gene selection approaches in DNA microarray domains. Artificial Intelligence in Medicine, 31(2), 91–103.

[10] Jolliffe, I. T. (2002). Principal Component Analysis, Second Edition. Encyclopedia of Statistics in Behavioral Science, 30(3), 487.

[11] Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1–2), 273–324.

[12] Leach, M. (2012). Parallelising feature selection algorithms. University of Manchester.

[13] Lee, I.-H., Lushington, G. H., & Visvanathan, M. (2011). A filter-based feature selection approach for identifying potential biomarkers for lung cancer. Journal of Clinical Bioinformatics, 1(1), 11.

[14] Liu, H., Setiono, R., Science, C., & Ridge, K. (1995). Chi2: Feature Selection, 388–391.

[15] Novaković, J., Strbac, P., & Bulatović, D. (2011). Toward optimal feature selection using ranking methods and classification algorithms. Yugoslav Journal of Operations Research, 21(1), 119–135.

[16] Patil, T. R. (2013). Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification. International Journal Of Computer Science And Applications, ISSN: 0974-1011, 6(2), 256–261.

[17] Roslina, A. H., & Noraziah, A. (2010). Prediction of hepatitis prognosis using support vector machines and wrapper method. Proceedings - 2010 7th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2010, 5(Fskd), 2209–2211.

[18] Sathyadevi, G. (2011). Application of CART algorithm in hepatitis disease diagnosis. International Conference on Recent Trends in Information Technology, ICRTIT 2011, 1283–1287.

[19] Witten, I. H., Frank, E., & Hall, M. a. (2011). Data Mining: Practical Machine Learning Tools and Techniques (Google eBook). Complementary literature None.

[20] Yasin, H. (2011). Hepatitis-C Classification using Data Mining Techniques, 24(3), 1–6.

Downloads

Published

2025-11-24

How to Cite

[1]
A. Shrivas and P. K. Chandrakar, “Feature Selection on High Dimensional Big Data of Gens Expression Using Filter Based Feature Selection Methods”, Int. J. Comp. Sci. Eng., vol. 7, no. 3, pp. 105–108, Nov. 2025.