A Study on Benchmarking Parameters for Intelligent Systems
Keywords:
Predictive performance, Confusion Matrix, Receiver operating characteristic (ROC), Akaike information criteria (AIC), Kappa statistic, Lift, Cumulative gain, Probability ThresholdAbstract
Intelligent automated decision support systems are now found to be very much useful in various fields. In bioinformatics and machine learning in general, there is a large variation in the predictive measures that are used to evaluate intelligent systems. If we do not assess the accuracy of model's prediction, a vital step in model development, its application will have little merit. This work critically discusses different approaches to assess predictive performance and various test statistics. Choice of assessing strategy or validation for a specific application helps in determining the suitability of the model and in comparing the performances of different modeling techniques. The purpose of this paper is to serve as an introduction to various important benchmarking parameters and as a guide for using them in research.
References
Z. Lu et al., Predicting Subcellular Localization of Proteins using Machine Learned Classifiers, Bioinformatics, vol 20, issue 4, pp. 547-556, 2004.
R. Eisner et al., Improving Protein Function Prediction using the Hierarchical Structure of the Gene Ontology, IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, Nov-2005. --
J. A. Swets, Measuring the accuracy of diagnostic systems. Science, 240, pp. 1285–1293,1988.
Xiao-Hua Zhou and Jaroslaw Harezlak, Comparison of bandwidth selection methods for kernel smoothing of ROC curves, Statist. Med.; 21:2045–2055, 2002.
Tom Fawcett, An introduction to ROC analysis , Pattern Recognition Letters, vol 27, pp. 861-874, 2006.
Jacob Cohen, A coefficient of agreement for nominal scales, Educational and Psychological Measurement, 20 (1), pp. 37–46, 1960.
J.R. Landis., G.G. Koch, The measurement of observer agreement for categorical data, Biometrics, vol 33, no. 1, pp 159-174, 1977.
H. Akaike, Information Theory and An Extension of The Maximum Likelihood Principle. In B.N. Petrov and F. Caske (eds.), Second International Symposium on Information Theory. Budapest: Akademiai Kiado, 267-281, 1973.
H. Akaike, A Bayesian Analysis of The Minimum AIC procedure, Ann. Inst. Statist. Math., vol 30, no. 1, pp. 9-14, 1978.
G. Schwarz, Estimating the Dimension of a Model, The Annals of Statistics, Vol. 6, no. 2, pp. 461-464, 1978.
R. E. Kass and L. Wasserman, A Reference Bayesian Test for Nested Hypotheses and its Relationship to the Schwarz Criterion, J. American Stat. Association, Vol. 90, no. 431, pp. 928-934, 1995.
Tariq Jaffery, Shirley X. Liu, Measuring Campaign Performance by Using Cumulative Gain Lift Chart, SAS Global Forum 2009, Paper 196, 2009.
S. J. Philips et al., Maximum entropy modeling of species geographic distributions, Ecological Modelling, Vol. 190, pp. 231–259, 2006.
C. Liu et al, Selecting thresholds of occurrence in the prediction of species distributions. Ecography 28, pp. 385-393, 2005.
R. G. Pearson, Species' Distribution Modeling for Conservation Educators and Practitioners, Synthesis. American Museum of Natural History, 2007. Available at http://ncep.amnh.org.
S. Manel et al., Comparing discriminant analysis ,neural networks and logistic regression for prediction species distributions: a case study with a Himalayan river bird. Ecological Modelling, 120, 337-347, 1999.
M. P. Robertson et al., A PCA-based modeling technique for predicting environmental suitability for organisms from presence records, Diversity and Distributions 7, 15-27, 2001.
R. G. Pearson et al., . Model based uncertainty in species' range prediction, Journal of Biogeography, 33, pp. 1704-1711, 2006.
S. J. Phillips et al., Maximum entropy modeling of species geographic distributions, Ecological Modelling , Vol. 190, pp. 231-259, 2006.
R. G. Pearson et al., . Modelling species distributions in britain: A hierarchical integration of climate and land-cover data, Ecography, 27, 285-298, 2004.
S. Manel et al., Evaluating presences absence models in ecology: the need to account for prevalence, Journal of Applied Ecology, Vol. 38, pp.,921-931, 2001.
B. Huntley et al., Modelling present and potential future ranges of some European higher plants using climate response surfaces. Journal of Biogeography, Vol. 22, pp.967-1001, 1995.
J. Elith et al., Novel methods improve prediction of species' distributions from occurrence data, Ecography, Vol. 29, pp. 129-151, 2006.
J. S. Cramer,. Logit models: from economics and other fields. Cambridge University Press, 2003.
J. Pearce et al., Evaluating the predictive performance of habitat models developed using logistic regression, Ecological Modelling, Vol. 133, pp. 225–245, 2000.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
