Handling Imbalanced Heart Disease Data and Explaining the Factors

Authors

Sandip Das Dept. of Computer Science and Engineering, JIS University, Kolkata, India https://orcid.org/0000-0002-7147-3898
Gairik Sajjan Dept. of Computer Science and Engineering, JIS University, Kolkata, India https://orcid.org/0000-0001-5778-1360
Tamojit Dasgupta Dept. of Computer Science and Engineering, JIS University, Kolkata, India https://orcid.org/0009-0001-0476-6746
Arkajyoti Poddar Dept. of Computer Science and Engineering, JIS University, Kolkata, India https://orcid.org/0009-0001-0476-6746
Sayani Patty Dept. of Computer Science and Engineering, JIS University, Kolkata, India https://orcid.org/0009-0000-3836-5353
Debmitra Ghosh Dept. of Computer Science and Engineering, JIS University, Kolkata, India

Keywords:

Heart Disease, SMOTE, Machine Learning, Explainable AI

Abstract

Heart disease is one of the most serious and life threatening problems. If predicted beforehand, many lives can be saved. But, the problem is that medical datasets are highly imbalanced, which leads machine learning algorithms to perform poorly on the minority class. Which in terms leads to wrong predictions. In healthcare it is highly risky to predict something wrongly, because, people’s lives are on stake. The ratio of minority and majority class data should be 1:1, or near about equal, in order to get a good result. Synthetic Minority Oversampling TEchnique(SMOTE) is one such oversampling technique that makes it come true, which is used in this work. In addition we have used eXplainable AI(XAI) to better visualise the predictions. We have used LIME (Local Interpretable Model-agnostic Explanation) and SHAP (Shapely Additive Explanations) algorithms to understand the contributions of features towards the predictions.

References

[1] Deldar, K., Mahdavi, M., & Mohammadzadeh, N. (2020). Handling imbalanced healthcare data with supervised and unsupervised methods: A systematic literature review. Journal of biomedical informatics, 109, 103516.

[2] Alshammari, R., & Bahsoon, R. (2019). Handling imbalanced data in healthcare: A systematic review. ACM Computing Surveys (CSUR), Vol.52, Issue.5, pp.1-38, 2019.

[3] Wang, S., Yao, J., Hu, Y., Zhao, L., & Zhang, Y. (2020). Addressing imbalanced datasets in medical image analysis. IEEE Transactions on Medical Imaging, Vol.39, Issue.7, pp.2408-2418, 2020.

[4] Al-Bahrani, R., Huang, W., & El-Sheimy, N. (2019). imbalanced healthcare data using ensemble methods and data sampling techniques. Applied Sciences, Vol.9, Issue.13, 2721, 2019.

[5] https://www.cdc.gov/heartdisease/facts.htm [DATASET]

[6] Wang, H., Yang, X., & Zhang, Q. (2019). A deep learning framework for handling imbalanced medical data. IEEE Access, 7, 89154-89162.

[7] Yao, J., Wang, S., Li, W., & Zhang, Y. (2020). Handling imbalanced electronic health record data using convolutional neural networks with auxiliary training. Journal of biomedical informatics, 110, 103530.

[8] L.H. Yang, J. Liu, Y.M.Wang, L. Martínez, A micro-extended belief rule-based system for big data multiclass classification problems, IEEE Trans. Syst. Man Cybern. Syst. pp.1–21, 2018.

[9] P.V. Ngoc, C.V.T. Ngoc, T.V.T. Ngoc, D.N. Duy. A C4. 5 algorithm for english emotional classification, Evolving Syst. 10, pp.425–451, 2019.

[10] Datta, Shounak, and Swagatam Das.Near-Bayesian Support Vector Machines forImbalanced Data Classi?cation with Equal or Unequal Misclassi?cation Costs. NeuralNetworks 70: pp.39–52, 2015.

[11] ahajournals.org/doi/full/10.1161/CIRCULATIONAHA.114.008729

Downloads

PDF ⁰

Published

2026-01-19

How to Cite

[1]

S. Das, G. Sajjan, T. Dasgupta, A. Poddar, S. Patty, and D. Ghosh, “Handling Imbalanced Heart Disease Data and Explaining the Factors”, Int. J. Comp. Sci. Eng., vol. 11, no. 1, pp. 42–65, Jan. 2026.

Download Citation

Issue

Vol. 11 No. 1 (2023): Special Issue-1 Nov Editon

Section

Research Article

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.

Handling Imbalanced Heart Disease Data and Explaining the Factors

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Journal Information

UGC Gazette Regulation

Join Editorial Board

Information

Current Issue

Keywords