Handling Imbalanced Heart Disease Data and Explaining the Factors
Keywords:
Heart Disease, SMOTE, Machine Learning, Explainable AIAbstract
Heart disease is one of the most serious and life threatening problems. If predicted beforehand, many lives can be saved. But, the problem is that medical datasets are highly imbalanced, which leads machine learning algorithms to perform poorly on the minority class. Which in terms leads to wrong predictions. In healthcare it is highly risky to predict something wrongly, because, people’s lives are on stake. The ratio of minority and majority class data should be 1:1, or near about equal, in order to get a good result. Synthetic Minority Oversampling TEchnique(SMOTE) is one such oversampling technique that makes it come true, which is used in this work. In addition we have used eXplainable AI(XAI) to better visualise the predictions. We have used LIME (Local Interpretable Model-agnostic Explanation) and SHAP (Shapely Additive Explanations) algorithms to understand the contributions of features towards the predictions.
References
[1] Deldar, K., Mahdavi, M., & Mohammadzadeh, N. (2020). Handling imbalanced healthcare data with supervised and unsupervised methods: A systematic literature review. Journal of biomedical informatics, 109, 103516.
[2] Alshammari, R., & Bahsoon, R. (2019). Handling imbalanced data in healthcare: A systematic review. ACM Computing Surveys (CSUR), Vol.52, Issue.5, pp.1-38, 2019.
[3] Wang, S., Yao, J., Hu, Y., Zhao, L., & Zhang, Y. (2020). Addressing imbalanced datasets in medical image analysis. IEEE Transactions on Medical Imaging, Vol.39, Issue.7, pp.2408-2418, 2020.
[4] Al-Bahrani, R., Huang, W., & El-Sheimy, N. (2019). imbalanced healthcare data using ensemble methods and data sampling techniques. Applied Sciences, Vol.9, Issue.13, 2721, 2019.
[5] https://www.cdc.gov/heartdisease/facts.htm [DATASET]
[6] Wang, H., Yang, X., & Zhang, Q. (2019). A deep learning framework for handling imbalanced medical data. IEEE Access, 7, 89154-89162.
[7] Yao, J., Wang, S., Li, W., & Zhang, Y. (2020). Handling imbalanced electronic health record data using convolutional neural networks with auxiliary training. Journal of biomedical informatics, 110, 103530.
[8] L.H. Yang, J. Liu, Y.M.Wang, L. Martínez, A micro-extended belief rule-based system for big data multiclass classification problems, IEEE Trans. Syst. Man Cybern. Syst. pp.1–21, 2018.
[9] P.V. Ngoc, C.V.T. Ngoc, T.V.T. Ngoc, D.N. Duy. A C4. 5 algorithm for english emotional classification, Evolving Syst. 10, pp.425–451, 2019.
[10] Datta, Shounak, and Swagatam Das.Near-Bayesian Support Vector Machines forImbalanced Data Classi?cation with Equal or Unequal Misclassi?cation Costs. NeuralNetworks 70: pp.39–52, 2015.
[11] ahajournals.org/doi/full/10.1161/CIRCULATIONAHA.114.008729
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
