Cross Validation Of Supervised Machine Learning Models Based On Random Forest and Support Vector Machine Techniques for 12S rRNA Molecular Marker: Implementation, Comparison and Utility

Authors

  • Pati R ICAR-National Bureau of Fish Genetic Resources, Lucknow- 226002, (U.P.) India
  • Pathak AK Dept. of Computer Science, Awadhesh Pratap Singh University, Rewa-486003 (M.P.), India

DOI:

https://doi.org/10.26438/ijcse/v6i11.345349

Keywords:

Machine learning method, Random forest, Support vector machine, Folding level, 12S rRNA, Cross validation

Abstract

Folding plays imperative role in the cross validation studies of machine learning based models. The folding divides the original sample into training and test sets, which evaluate performance of the machine learning based models and present scenarios for optimising the efficacy of such models. The present study discusses about the computational approaches applied for preparing training and test sets at different folds from 12S rRNA molecular marker sequence dataset of fish and application of these sets to estimate the performance of the proposed models based on machine learning techniques viz. Random Forest and Support Vector Machine. Additionally, the study presents the comparative accounts on efficacies of these models estimated at different folding. The findings from the study showed that folding has linear relationship with the efficacy of the model. The model with random forest was found better for solving the classification problems of the molecular marker sequence data. This study provides understanding on utility of the folding level in increasing the efficacy of the machine learning based methods and suggests for suitable machine learning method for solving the multiclass problem data especially where the identification using the molecular markers sequence data is involved

References

[1] T. Mitchell, “Machine Learning, McGraw Hill Publisher, New York, NY,” pp-441, 1997.

[2] S.U. Bohra, P.V. Ingole , “Review on Neural Network Based Approach Towards English Handwritten Alphanumeric Characters Recognition”, International Journal of Computer Sciences and Engineering, Vol.1, Issue.3, pp.22-25, 2013.

[3] V. Bhambri, “Data Mining as a Solution for Data Management in Banking Sector”, International Journal of Computer Sciences and Engineering, Vol.1, Issue.1, pp.20-25, 2013.

[4] P. Yang, , Hwa Y. Yang, B. Zhou, and Y. Zomaya, et al., “A review of ensemble methods in bioinformatics,” Current Bioinformatics, vol. 5(4), pp. 296–308, 2010.

[5] A.E. Dahlberg, “The functional role of ribosomal RNA in protein synthesis,” Cell, vol. 57, pp. 525–529, 1989.

[6] H.F. Noller, “Structure of ribosomal RNA,” Annual Review Biochemistry, vol. 53, pp. 119–162, 1984.

[7] K.M. Kjer, “Use of rRNA secondary structure in phylogenetic studies to identify homologous positions: an example of alignment and data presentation from the frogs,” Molecular Phylogenetics and Evolution, vol. 4, pp. 314–330, 1995.

[8] A.M. Simons and R.L. Mayden, “Phylogenetic relationships of the western North American phoxinins (Actinopterygii: Cyprinidae) as inferred from mitochondrial 12S and 16S ribosomal RNA sequences,” Molecular Phylogenetics and Evolution, vol. 9, pp. 308–329, 1998.

[9] J. Alves-Gomes, G. Orti, M. Haygood, W. Heiligenberg, and A. Meyer, “Phylogenetic analysis of South American electric fishes (order: Gymnotiformes) and the evolution of their electrogenic system: a synthesis based on morphology, electrophysiology, and mitochondrial sequence data,” Molecular Biology and Evolution, vol. 12, pp. 298-318, 1995.

[10] J.C.I. Lee and J.G. Chang, “Random amplified polymorphic DNA polymerase chain reaction (RAPD PCR) fingerprints in forensic species identification,” Forensic Science International, vol. 67(2), pp. 103–107, 1994.

[11] R.S. Blackett and P. Keim, “Big game species identification by deoxyribonucleic acid (DNA) probes,” Journal of Forensic Sciences, vol. 37(2), pp. 590–596, 1992.

[12] R. Meyer, C. Höfelein, J. Lüthy and U. Candrian, “Polymerase chain reaction-restriction fragment length polymorphism analysis: a simple method for species identification in food,” Journal of AOAC International, vol. 78(6), pp. 1542–1551, 1995.

[13] M.L. López-Andreo, Lugo, A. Garrido-Pertierra, M.I. Prieto and

A. Puyet, “Identification and quantitation of species in complex DNA mixtures by real-time polymerase chain reaction,” Analytical Biochemistry, vol. 339(1), pp. 73–82, 2005.

[14] NCBI Resource Coordinators, “Database resources of the National Center for Biotechnology Information,” Nucleic Acids Research, vol. 44, pp. D7–D19, 2016.

[15] X. Zhang, J. Lee, and L.A. Chasin, “The effect of nonsense codons on splicing: a genomic analysis,” RNA,vol. 9, pp. 637–639, 2006.

[16] C.M. Vander Walt and E. Barnard, “Data characteristics that determine classifier performance,” Proceedings of the 17th Annual Symposium of the Pattern Recognition Association of South Africa, pp. 166-171, 2006.

[17] Li. Yang, Z. Tan, D. Wang, L. Xue, M. Guan, T. Huang, and R. Li, “Species identification through mitochondrial rRNA genetic analysis,” Scientific Reports, vol. 4, pp. 4089, 2014.

[18] P.K. Meher, T.K. Sahu and A.R. Rao, “Identification of species based on DNA barcode using kmer feature vector and Random forest classifier,” Gene, vol. 592(2), pp. 316-24, 2016.

[19] C. Guisande, A. Manjarrés-Hernández, P. Pelayo-Villamil, C. Granado-Lorencio, I. Riveiro, A. Acu˜na, E. Prieto-Piraquive, E.

Janeiro, J.M. Matías, C. Patti, B. Patti, S. Mazzola, S. Jiménez, V. Duqueg and F. Salmerón, “IPez: An expert system for the taxonomic identification of fishes based on machine learning techniques,” Fisheries Research, vol. 102, pp. 240–247, 2010.

[20] Satoh P. Takashi, Miya Masaki, Mabuchi Kohji and Nishida Mutsumi, “Structure and variation of the mitochondrial genome of fishes,” BMC Genomics. Vol. 17,pp. 719, 2016.

[21] E. Weitschek, Iulia G. Fiscon and G. Felici “Supervised DNA Barcodes species classification: analysis, comparisons, and results,” BioData Mining, 7, pp. 4, 2014

Downloads

Published

2025-11-18
CITATION
DOI: 10.26438/ijcse/v6i11.345349
Published: 2025-11-18

How to Cite

[1]
R. Pati and A. K. Pathak, “Cross Validation Of Supervised Machine Learning Models Based On Random Forest and Support Vector Machine Techniques for 12S rRNA Molecular Marker: Implementation, Comparison and Utility”, Int. J. Comp. Sci. Eng., vol. 6, no. 11, pp. 345–349, Nov. 2025.

Issue

Section

Research Article