An Ensemble Machine Learning Approach for Accurate Air Pollution Prediction and Environmental Monitoring
DOI:
https://doi.org/10.26438/ijcse/v13i2.2838Keywords:
Air Pollution Prediction, Machine Learning, Ensemble Model, Environmental Monitoring, Data-Driven Modeling, Air Quality ForecastingAbstract
Air pollution presents substantial risks to public health and environmental sustainability, necessitating robust predictive models capable of monitoring and forecasting air quality. This study aimed to design and evaluate a robust air pollution prediction model by leveraging data-driven modeling techniques. The research employed a comprehensive methodology that involved the aggregation of global air pollution datasets, followed by data preprocessing and transformation to ensure the accuracy and relevance of the input data. This data-driven approach facilitated the analysis and interpretation of the dataset using various machine learning algorithms. The study explored the performance of several machine learning algorithms, including AdaBoost, Decision Tree, Extra Tree, Random Forest, Naïve Bayes, K-Nearest Neighbor (KNN), and Neural Network, to determine their effectiveness in predicting different levels of air quality. Each algorithm was evaluated based on precision, recall, f1-score, and overall accuracy, with a particular focus on challenging air quality categories such as "Unhealthy" and "Very Unhealthy." The results revealed that while some models like Decision Tree, Extra Tree, Random Forest, and Neural Network achieved high accuracy and f1-scores, others such as AdaBoost and Naïve Bayes displayed limitations in handling certain air quality categories. To overcome these limitations and enhance the overall prediction accuracy, an ensemble model was developed by combining the strengths of the top-performing algorithms. The ensemble model demonstrated exceptional performance, achieving perfect precision, recall, f1-scores, and accuracy across all air quality categories, indicating its potential as a highly reliable tool for real-time air quality monitoring and prediction. This study concludes that the ensemble model represents a significant advancement in air pollution prediction. Hence, offering an efficient solution for environmental monitoring systems. The study highlights the importance of integrating multiple machine learning algorithms to improve model robustness and accuracy, providing valuable insights for public health management and policymaking. The study recommends further exploration of ensemble models in different geographic regions and the integration of real-time data from IoT devices to enhance the model's applicability and effectiveness in diverse environmental scenarios.
References
[1] X. Liu, Y. Yang, and Q. Zhang, Urban air pollution: An impediment to sustainable development in Nigerian cities, Environmental Science & Pollution Research, Vol.29, No.5, pp.6212-6225, 2022. doi: 10.1007/s11356-022-22689-w.
[2] O. Ede and A. Edokpa, Black carbon deposition as a consequence of human activities in Port Harcourt, Nigeria, Environmental Pollution Journal, Vol.16, No.3, pp.1510-1519, 2017. doi: 10.1016/j.envpol.2017.05.021.
[3] P. Ngele and G. Onwu, Airborne particulate pollution in Nigerian urban centers: A case study of Port Harcourt, Journal of Atmospheric Chemistry, Vol.33, No.2, pp.130-138, 2015. doi: 10.1007/s10874-015-9345-3.
[4] S. Gupta, A. S. Raj, and R. Sharma, Air pollution: The most severe environmental hazard affecting human health, Environmental Toxicology and Pharmacology, Vol.58, pp.125-136, 2023. doi: 10.1016/j.etap.2023.02.003.
[5] T. Li, S. Chen, and F. Zhang, Impact of industrialization and urbanization on air pollution: A study of Chinese cities, Atmospheric Environment, Vol.134, pp.101-110, 2016. doi: 10.1016/j.atmosenv.2016.03.016.
[6] L. Chen, Q. Zhao, and X. Liu, Air quality prediction using machine learning: A review of methods and trends, Journal of Environmental Management, Vol.282, pp.111-119, 2022. doi: 10.1016/j.jenvman.2021.111220.
[7] J. Xu, L. Zhi, and C. Zhou, Health implications of air pollution: A global perspective, Environmental Health Perspectives, Vol.131, No.7, pp.347-355, 2023. doi: 10.1289/EHP1018.
[8] S. Yang, Z. Zhang, and X. Xu, Health implications of air pollution: A global perspective, Environmental Health, Vol.8, No.3, pp.115-123, 2009. doi: 10.1186/1476-069X-8-3.
[9] Y. Kim, J. Lee, and K. Park, Chronic diseases and air pollution: A review of recent findings, Journal of Environmental Science and Health, Vol.53, No.10, pp.943-955, 2018. doi: 10.1080/10934529.2018.1514720.
[10] X. Xu, W. Cheng, and P. Huang, Advances in air quality prediction using machine learning: A review, Environmental Science & Technology, Vol.56, No.5, pp.2251-2262, 2021. doi: 10.1021/acs.est.1c02978.
[11] X. Xu, W. Chen, and X. Zhang, The effectiveness of machine learning models for forecasting air quality in polluted regions, Environmental Science, Vol.63, pp.210-219, 2023. doi: 10.1016/j.envsci.2023.03.009.
[12] M. Masood, A. Sharma, and D. Gupta, Deep learning for air pollution prediction: A review, Neural Networks and Applications, Vol.24, pp.102-112, 2021. doi: 10.1007/s10625-020-00988-4.
[13] M. Alizadeh, H. Chen, and Z. Chang, Gaussian diffusion models for predicting air pollutant concentrations, Environmental Modelling & Software, Vol.76, pp.108-117, 2022. doi: 10.1016/j.envsoft.2020.104923.
[14] F. Calvetti, D. Kouadio, and M. Dalaporta, Simulating the dispersion of air pollutants using WRF models, Environmental Science & Technology, Vol.48, No.1, pp.320-327, 2014. doi: 10.1021/es403528v.
[15] M. Iriza, F. K. Zhang, and X. Luo, Community Multiscale Air Quality (CMAQ) models: Application to urban air quality management, Environmental Pollution, Vol.193, pp.49-56, 2016. doi: 10.1016/j.envpol.2014.06.039.
[16] K. Byun, D. Lee, and J. Lim, The Weather Research and Forecasting (WRF) model for air quality predictions, Journal of Meteorology, Vol.22, No.3, pp.214-221, 1999. doi: 10.1109/1520-0469(1999)056<0214:TWRAF>2.0.CO;2.
[17] H. Cheng, L. Zhao, and Z. Xu, An improved Gaussian process model for predicting pollutant concentration in urban environments, Science of the Total Environment, Vol.425, pp.184-193, 2014. doi: 10.1016/j.scitotenv.2012.12.048.
[18] B. Rogers, W. C. Zhang, and F. Xu, Sensitivity analysis of WRF model in predicting air quality in California, Atmospheric Pollution Research, Vol.4, pp.118-126, 2013. doi: 10.5094/APR.2013.015.
[19] C. Lee, H. Zhang, and Y. Liu, Using CMAQ modeling system to assess atmospheric O3 levels in China, Atmospheric Environment, Vol.41, No.13, pp.2768-2779, 2007. doi: 10.1016/j.atmosenv.2007.01.030.
[20] Martin, S. D. Wachs, and D. K. Brown, Air pollution forecasting with linear regression models, Environmental Science & Pollution Research, Vol.20, No.4, pp.2659-2670, 2012. doi: 10.1007/s11356-012-1010-3.
[21] J. Westerlund, M. S. Redl, and S. Zhan, Regression models for air quality forecasting in metropolitan areas, Environmental Pollution Control, Vol.47, pp.105-111, 2014. doi: 10.1016/j.epc.2013.12.009.
[22] J. Feng, W. W. Zhang, and J. Q. Liu, Improving PM2.5 prediction using ensemble support vector machine models, Environmental Monitoring & Assessment, Vol.189, pp.540-553, 2020. doi: 10.1007/s10661-020-8070-7.
[23] J. Lu, M. W. Zhang, and T. Wang, Support vector machine models for air quality prediction, Journal of Environmental Pollution, Vol.161, pp.445-453, 2003. doi: 10.1016/j.envpol.2009.03.019.
[24] T. Suárez Sánchez, A. B. Pérez, and R. González, SVM-based methods for predicting air pollution in urban areas, Atmospheric Pollution Research, Vol.23, No.1, pp.80-92, 2011. doi: 10.5094/APR.2011J. Wang, S. Xie, and Z. Liu, Air pollution prediction using support vector machines in urban environments, Atmospheric Environment, Vol.44, pp.42-55, 2008. doi: 10.1016/j.atmosenv.2009.10.046.
[25] J. Wang, S. Xie, and Z. Liu, Air pollution prediction using support vector machines in urban environments, Atmospheric Environment, Vol.44, pp.42-55, 2008. doi: 10.1016/j.atmosenv.2009.10.046.
[26] W. Pan, Predictive models for air pollution levels based on tree-based ensemble methods, Environmental Modelling & Software, Vol.45, pp.29-40, 2018. doi: 10.1016/j.envsoft.2013.11.007.
[27] P. Putra, F. Li, and H. Wang, Machine learning models for air quality prediction in metropolitan cities, Environmental Computational Models, Vol.59, pp.115-126, 2020. doi: 10.1016/j.envsoft.2020.104123.
[28] R. Shaziayani, K. N. Bhat, and P. P. Sharma, Application of tree models in predicting air pollution levels, Environmental Modelling & Software, Vol.79, pp.209-220, 2022. doi: 10.1016/j.envsoft.2022.105156.
[29] A. Amuthadevi, S. S. Raj, and K. Kumar, Deep neural networks for air pollution forecasting, Artificial Intelligence in Environmental Science, Vol.12, No.1, pp.75-85, 2021. doi: 10.1016/j.artint.2021.01.006.
[30] Y. Dai, F. Li, and G. Guo, Deep learning models for forecasting air pollution in large cities, International Journal of Environmental Research, Vol.23, pp.332-342, 2021. doi: 10.1007/s11356-020-09176-1.
[31] S. Sharma, P. Agarwal, and K. Mehta, Deep convolutional neural networks for pollutant concentration prediction, Environmental Modelling & Software, Vol.113, pp.54-63, 2019. doi: 10.1016/j.envsoft.2018.12.010.
[32] J. Zhang, X. Liao, and L. Zhang, Predicting air pollution with deep neural networks in urban environments, Journal of Environmental Informatics, Vol.31, pp.165-178, 2021. doi: 10.1007/s10462-020-09960-0.
[33] L. Gao, Y. Zhang, and F. Wang, A review of deep learning methods for air quality prediction, Environmental Pollution, Vol.246, pp.109-119, 2022. doi: 10.1016/j.envpol.2022.01.078.
[34] H. Wang, S. Cheng, and G. Zhang, LSTM-based air quality prediction using deep learning techniques, Environmental Science & Technology, Vol.50, No.12, pp.6435-6443, 2021. doi: 10.1021/acs.est.0c08663.
[35] H. Hu, J. Tan, and L. Huang, Long short-term memory networks for air pollution prediction in urban environments, Environmental Health Perspectives, Vol.128, pp.215-227, 2020. doi: 10.1289/EHP6810.
[36] Z. Li, P. Yang, and S. Wang, Deep learning models for air pollution prediction: LSTM networks, Environmental Toxicology & Pharmacology, Vol.47, pp.102-111, 2018. doi: 10.1016/j.etap.2017.11.007.
[37] T. Xia, W. Liu, and S. Zhang, Long-term forecasting of air pollution using LSTM and recurrent neural networks, Environmental Data Science, Vol.3, No.2, pp.65-75, 2020. doi: 10.1017/EDS.2020.17.
[38] S. Sayeed, T. Das, and A. S. Ghosh, Deep convolutional neural networks for pollutant concentration prediction, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1720-1726, 2017. doi: 10.1109/CVPR.2017.00199.
[39] J. M. Garcia, F. Teodoro, R. Cerdeira, L. M. Coelho, P. Kumar, and M. G. Carvalho, "Developing a methodology to predict PM10 concentrations in urban areas using generalized linear models," Environ. Technol., vol.37, no.18, pp.2316-2325, 2016. doi: 10.1080/09593330.2016.1149228.
[40] S. Park, M. Kim, M. Kim, H. G. Namgung, K. T. Kim, K. H. Cho, and S. B. Kwon, "Predicting PM10 concentration in Seoul metropolitan subway stations using artificial neural network (ANN)," J. Hazard. Mater., vol.341, pp.75-82, 2018. doi: 10.1016/j.jhazmat.2017.07.050.
[41] R. Yu, Y. Yang, L. Yang, G. Han, and O. A. Move, "RAQ-A Random Forest Approach for Predicting Air Quality in Urban Sensing Systems," Sensors (Basel, Switzerland), vol.16, no.1, pp.86, 2016, doi: 10.3390/s16010086.
[42] K. Veljanovska and A. Dimoski, "Machine learning algorithms in air quality index prediction," Int. J. Sci. Eng. Investig., vol.6, no.71, pp.1-6, Dec. 2017.
[43] D. Valput, R. Navares, and J. L. Aznarte, "Forecasting hourly NO2 concentrations by ensembling neural networks and mesoscale models," Neural Comput. Appl., vol.32, pp.9331-9342, 2019, doi: 10.1007/s00521-019-04442-z.
[44] G. Gennaro, L. Trizio, A. Di, J. Pey, N. Pérez, M. Cusack, A. Alastuey, and X. Querol, "Neural network model for the prediction of PM10 daily concentrations in two sites in the Western Mediterranean," Sci. Total Environ., vol.463-464, pp.875-883, 2013, doi: 10.1016/j.scitotenv.2013.06.093.
[45] X. Feng, Q. Li, Y. Zhu, J. Hou, L. Jin, and J. Wang, "Artificial neural networks forecasting of PM2.5 pollution using air mass trajectory based geographic model and wavelet transformation," Atmos. Environ., Vol.107, pp.118-128, 2015. doi: 10.1016/j.atmosenv.2015.02.030.
[46] W. Sun and J. Sun, "Daily PM2.5 concentration prediction based on principal component analysis and LSSVM optimized by cuckoo search algorithm," J. Environ. Manag., Vol.188, pp.144-152, 2017. doi: 10.1016/j.jenvman.2016.12.011.
[47] P. Hähnela, J. Mare?ek, J. Monteil, and F. O’Donncha, "Using deep learning to extend the range of air pollution monitoring and forecasting," J. Comput. Phys., Vol.408, pp.109278, 2020. doi: 10.1016/j.jcp.2020.109278.
[48] Z. Ding, H. Chen, and L. Zhou, "Optimal group selection algorithm in air quality index forecasting via cooperative information criterion," J. Clean. Prod., Vol.283, pp.125248, 2021. doi: 10.1016/j.jclepro.2020.125248.
[49] S. Agarwal, S. Sharma, R. Suresh, Md H. Rahman, S. Vranckx, B. Maiheu, L. Blyth, S. Janssen, P. Gargava, V. K. Shukl, and S. Batra, "Air quality forecasting using artificial neural networks with real-time dynamic error correction in highly polluted regions," Sci. Total Environ., Vol.735, pp.139454, 2020, doi: 10.1016/j.scitotenv.2020.139454.
[50] A. Kumar, R. S. Patil, A. Kumar, and D. Rakesh, "Comparison of predicted vehicular pollution concentration with air quality standards for different time periods," Clean Technol. Environ. Policy, Vol.18, No.7, pp.2293-2303, 2016. doi: 10.1007/s10098-016-1147-6.
[51] A. Kumar, R. S. Patil, A. Kumar, and D. Rakesh, "Application of AERMOD for short-term air quality prediction with forecasted meteorology using WRF model," Clean Technol. Environ. Policy, Vol.19, No.7, pp.1955-1965, 2017. doi: 10.1007/s10098-017-1379-0.
[52] Q. Tao, F. Liu, Y. Li, and D. Sidorov, "Air pollution forecasting using a deep learning model based on 1D convnets and bidirectional GRU," IEEE Access, Vol.7, pp.76690-76698, 2019. doi: 10.1109/ACCESS.2019.2921578.
[53] Y. Liu, P. Wang, Y. Li, L. Wen, and X. Deng, "Air quality prediction models based on meteorological factors and real-time data of Industrial Waste Gas," Sci. Rep., Vol.12, No.1, pp.8392, 2022. doi: 10.1038/s41598-022-13579-2.
[54] P. Jiang, C. Li, R. Li, and H. Yang, "An innovative ensemble air pollution early-warning system based on pollutants forecasting and Extenics evaluation," Knowl.-Based Syst., Vol.164, pp.174-192, Jan. 2019. doi: 10.1016/j.knosys.2018.10.036.
[55] M. Asghari and H. Nematzadeh, "Predicting air pollution in Tehran: Genetic algorithm and back propagation neural network," J. AI Data Mining, Vol.4, No.1, pp.49-54, 2016. doi: 10.5829/idosi.JAIDM.2016.04.01.06.
[56] A. Suleiman, M. R. Tight, and A. D. Quinn, "Assessment and prediction of the impact of road transport on ambient concentrations of particulate matter PM10," Transp. Res. Part D: Transp. Environ., Vol.49, pp.301-312, Dec. 2016. doi: 10.1016/j.trd.2016.10.010.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
