Handling Imbalanced Heart Disease Data and Explaining the Factors

Authors

Keywords:

Heart Disease, SMOTE, Machine Learning, Explainable AI

Abstract

Heart disease is one of the most serious and life threatening problems. If predicted beforehand, many lives can be saved. But, the problem is that medical datasets are highly imbalanced, which leads machine learning algorithms to perform poorly on the minority class. Which in terms leads to wrong predictions. In healthcare it is highly risky to predict something wrongly, because, people’s lives are on stake. The ratio of minority and majority class data should be 1:1, or near about equal, in order to get a good result. Synthetic Minority Oversampling TEchnique(SMOTE) is one such oversampling technique that makes it come true, which is used in this work. In addition we have used eXplainable AI(XAI) to better visualise the predictions. We have used LIME (Local Interpretable Model-agnostic Explanation) and SHAP (Shapely Additive Explanations) algorithms to understand the contributions of features towards the predictions.

References

[1] Deldar, K., Mahdavi, M., & Mohammadzadeh, N. (2020). Handling imbalanced healthcare data with supervised and unsupervised methods: A systematic literature review. Journal of biomedical informatics, 109, 103516.

[2] Alshammari, R., & Bahsoon, R. (2019). Handling imbalanced data in healthcare: A systematic review. ACM Computing Surveys (CSUR), Vol.52, Issue.5, pp.1-38, 2019.

[3] Wang, S., Yao, J., Hu, Y., Zhao, L., & Zhang, Y. (2020). Addressing imbalanced datasets in medical image analysis. IEEE Transactions on Medical Imaging, Vol.39, Issue.7, pp.2408-2418, 2020.

[4] Al-Bahrani, R., Huang, W., & El-Sheimy, N. (2019). imbalanced healthcare data using ensemble methods and data sampling techniques. Applied Sciences, Vol.9, Issue.13, 2721, 2019.

[5] https://www.cdc.gov/heartdisease/facts.htm [DATASET]

[6] Wang, H., Yang, X., & Zhang, Q. (2019). A deep learning framework for handling imbalanced medical data. IEEE Access, 7, 89154-89162.

[7] Yao, J., Wang, S., Li, W., & Zhang, Y. (2020). Handling imbalanced electronic health record data using convolutional neural networks with auxiliary training. Journal of biomedical informatics, 110, 103530.

[8] L.H. Yang, J. Liu, Y.M.Wang, L. Martínez, A micro-extended belief rule-based system for big data multiclass classification problems, IEEE Trans. Syst. Man Cybern. Syst. pp.1–21, 2018.

[9] P.V. Ngoc, C.V.T. Ngoc, T.V.T. Ngoc, D.N. Duy. A C4. 5 algorithm for english emotional classification, Evolving Syst. 10, pp.425–451, 2019.

[10] Datta, Shounak, and Swagatam Das.Near-Bayesian Support Vector Machines forImbalanced Data Classi?cation with Equal or Unequal Misclassi?cation Costs. NeuralNetworks 70: pp.39–52, 2015.

[11] ahajournals.org/doi/full/10.1161/CIRCULATIONAHA.114.008729

Downloads

Published

2026-01-19

How to Cite

[1]
S. Das, G. Sajjan, T. Dasgupta, A. Poddar, S. Patty, and D. Ghosh, “Handling Imbalanced Heart Disease Data and Explaining the Factors”, Int. J. Comp. Sci. Eng., vol. 11, no. 1, pp. 42–65, Jan. 2026.