A Rotation Forest Algorithm for Predicting BOD in River Water
Keywords:
BOD, rotation forest, ensembl, M5,MLP, PC, Correlation Coeffecient, RMSEAbstract
Biochemical oxygen demand (BOD) is an important parameter for measuring the water quality especially the extent of water pollution due to organic compounds. The standard test for BOD requires a time period of 5 days with stringent conditions to be observed with regards to temperature, nutrients available and the lighting conditions suitable for the microbial growth. In order to predict BOD of river water in a cost-effective and efficient manner, in this paper a data driven ensemble method namely a Rotation Forest (RF) has been implemented. The model uses model trees M5 as base learners and hence the name rotation forest. Each base learner is trained using the rotated feature axes built on feature subsets computed using Principal Component Analysis (PCA). This helps to improve diversity in training the base learners and hence improves the predictive accuracy. Experimental analysis on available data sets shows that the correlation coefficient of a proposed approach is 0.9386 and RMSE of 0.5388. The predictive accuracy of this model is also compared with Multilayer Perceptron (MLP) neural networks model. However the proposed model has high correlation coefficient and low RMSE than MLP.
References
[1] Clair N. Sawyer., Perry L., & McCarthy. 2003, Chemistry for environmental Engineering and Science, Tata McGraw Hill.
[2] Metcalf and Eddy. 2003.Wastewater Engineering-Treatment and Reuse, McGraw Hill. 4th edition.
[3] Tan P. Steinbach M. Kumar V. 2006. Introduction to Data Mining, Pearson Education.
[4] Musavi-Jahromi, S.H., and Golabi. M., 2008. Application of Artificial Neural Networks in the River Water Quality Modeling: Karoon River, Iran . Journal of Applied Sciences, 8: 2324-2328.
[5] Chadaphim P, Weeris T, Nagul C and Rajalida, 2016, Biochemical Oxygen Demand Prediction for Chaophrayariver using α-trimmed ARIMA model. 13th International Joint Conference on Computer Science and Software Engineering (JCSSE), IEEE.
[6] Maier, H.R., Dandy, G.C., 2000. Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications. Environmental Modelling and Software. 15, 101–124
[7] [Masrur Ahmed A. A, 2017, Prediction of dissolved oxygen in Surma River by biochemical oxygen demand and chemical oxygen demand using the artificial neural networks (ANNs), Journal of King Saud University – Engineering Sciences,29 (2), 151-158.
[8] Masrur Ahmed, Syed Mustakim Ali Shah. 2017. Application of adaptive neuro-fuzzy inference system(ANFIS) to estimate the biochemical oxygen demand (BOD) of Surma River. Journal of King Saud University – Engineering Sciences,29(3), 237-243
[9] Palani, S. Shie-Yui Liong, Pavel Tkalich. 2008. An ANN application for water quality forecasting, Marine Pollution Bulletin. 56, 1586–1597.
[10] Dutta P and Chaki. 2012. A Survey of Data Mining Applications in Water Quality Management, CUBE Intl. Information Technology Conference, 470 -475.
[11] Chan, Kwonk-Wing, and Nitin Muttil. 2007. Data Mining and Multivariate statistical analysis for ecological system in coastal waters. Journal of Hydroinformatics. 9(4).
[12] Brydon, D.A., Frodsham, D.A. 2001. A model-based approach to predicting BOD[sub 5] in settled sewage. Water Science & Technology. 44 Issue 2/3, 9-15
[13] Rene, E R. and Saidutta, M. B. 2008. Prediction of Water Quality Indices by Regression Analysis and Artificial Neural Network.,Int. J. of Environmental Research, 2(2).183-188.
[14] Dominguez-Granda, L., Lock , K., and P. L. M. Goethals. 2011. Application of classification trees to determine biological and chemical indicators for river assessment: case study in the Chaguana watershed (Ecuador). Journal of Hydroinformatics,. 13(3). 489-499.
[15] Rodriguez. J. J., Ludmila I. Kuncheva, Carlos J. Alonso. 2006. Rotation Forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence. 28(10).1619-1630
[16] Kunchieva L.I and Rodriguez J.2007. An Experimental Study of Rotation Forest Ensembles, LNCS, Springer-Verlag,.459 – 468.
[17] Kotsiantis S.B, and Pintelas P.E. 2009. Local Rotation Forest of Decision Stumps for Regression Problems. In 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT.170-174.
[18] Lasota T, Luczak T and Trawinski B.2012. Investigation of Rotation Forest Method Applied to Property Price Prediction. Artificial Intelligence and Soft Computing LCNS, Springer-Verlag, 7267, 403-411.
[19] Maier, H.R., Dandy, G.C., 2000. Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications. Environmental Modelling and Software. 15, 101–124
[20] Palani, S. Shie-Yui Liong, Pavel Tkalich. 2008. An ANN application for water quality forecasting, Marine Pollution Bulletin. 56, 1586–1597.
[21] Soman K.P and Diwakar S. 2006. Insight into Data Mining: Theory and Practise, PHI.
[22] Roiger, R.J., &Geatz, M.W., 2003. Data Mining A Tutorial Based Primer. Addison Wesley.
[23] Quinlan J. R. 1992. Learning with Continuous Classes, Proceedings of 5th Australian Joint Conference on Artificial Intelligence, World Scientific, Singapore, 343 – 348.
[24] Han J., and Kamber. M., 2001. Data Mining: Concepts and Techniques. Morgan Kaufmann
[25] Watcharapinchai N., Aramvith, S., Siddhichai, S., &Marukatat.S., 2008. Dimensionality Reduction of SIFT using PCA for Object Categorization. 2008 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS2008), Thailand, 1-4.
[26] Lakhina, S., Joseph, S., and Verma. B., 2010. Feature Reduction using Principal Component Analysis for Effective Anomaly–Based Intrusion Detection on NSL-KDD. Int. J. of Engineering Science and Technology, 2(6),1790-1799
[27] Witten, I. H.and Eibe Frank. 2000. Data Mining-Practical Machine learning tools and technology with Java implementations, Morgan Kauffman.
[28] Hall. M., Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten. 2009. The WEKA Data Mining Software: An Update, SIGKDD Explorations (2009), 11(1).
[29] Department of Environment, Food and Rural Affairs (DEFRA), 2011. UK Government website- http://data.gov.uk/dataset/river-water-quality-regions.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
