Two-Stage Email Classification Model for Enhanced Spam Filtering Through Feature Transformation and Iterative Learning

Authors

DOI:

https://doi.org/10.26438/ijcse/v13i2.1627

Keywords:

Email Spam Detection, Hybrid Classification Model, Logistic Regression, Principal Component Analysis, Feedforward Neural Network, Cybersecurity in Email Communication

Abstract

Email spam remains a persistent challenge, with cybercriminals constantly evolving tactics to bypass traditional detection methods. In response, this research introduces a novel two-stage classification model that combines the strengths of logistic regression, principal component analysis (PCA), and a feedforward neural network to achieve exceptional spam detection performance. The first stage employs a rapid logistic regression classifier to filter out obvious spam emails, dramatically reducing computational overhead. We then subject the remaining emails to Principal Component Analysis (PCA), extracting the most salient features while minimizing noise and dimensionality. This transformed feature space is then fed into a neural network, empowering it to capture the complex, non-linear patterns indicative of sophisticated spam attacks. Evaluation of the widely-used SpamAssassin Public Corpus and Lingspam datasets demonstrated the synergistic benefits of this hybrid approach, achieving 98.0% accuracy in spam detection for the Spam Assassin Public Corpus, which was refined from an initial accuracy of 99.95% following further testing and optimization, and 99.34% accuracy for the Lingspam dataset respectively, in spam detection. The strategic combination of techniques transcends the traditional speed-accuracy tradeoff, simultaneously creating a new benchmark in both performance metrics. This robustness, consistency, and scalability make the proposed model a practical and effective solution for real-world spam filtering, with significant implications for securing email communication and protecting users from cybercrime.

References

[1] Sahami, M., Dumais, S., Heckerman, D., & Horvitz, E. “A Bayesian approach to filtering junk email.” In Learning for Text Categorization: Papers from the 1998 Workshop, Vol.62, pp.98–105, 1998.

[2] Drucker, H., Wu, D., & Vapnik, V. N. “Support vector machines for spam categorization.” IEEE Transactions on Neural Networks, Vol.10, Issue.5, pp.1048-1054, 1999.

[3] Shekokar, N. M., Rachh, V. P., Gala, P. P., & Patel, C. N. “A Survey on Email Spam Detection Techniques.” Procedia Computer Science, Vol.45, pp.419-426, 2015.

[4] Almeida, T. A., Hidalgo, J. M. G., & Yamakami, A. “Contributions to the study of SMS spam filtering: new collection and results.” In Proceedings of the 11th ACM Symposium on Document Engineering, pp.259-262, 2011.

[5] Niu, Y., Wang, Y. M., Chen, H., Ma, M., & Hsu, F. “A quantitative study of forum spamming using context-based analysis.” In NDSS, 2007.

[6] Pandey, V., & Ravi, V. “A data mining approach using logistic regression and logit boost classifiers for email spam detection.” In 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), pp.1-5, 2013.

[7] Sakkis, G., Paliouras, G., Stamatopoulos, P., & Karkaletsis, V. “Combining rule-based and information-retrieval-based methods for email classification.” In International Conference on Artificial Intelligence: Methodology, Systems, and Applications, pp.145-154, 2003.

[8] Yadav, A. K., & Vishwakarma, D. K. “Sentiment analysis using deep learning architectures: a review.” Artificial Intelligence Review, Vol.53, Issue.6, pp.4335–4385, 2020.

[9] Luo, Y., Zhang, X., & Wang, J. “Deep Learning for Email Spam Detection: A Hybrid CNN-LSTM Approach.” IEEE Transactions on Cybernetics, Vol.53, Issue.4, pp.2345–2356, 2023.

[10] Wang, R., Li, B., & Gao, J. “Transformer-Based Email Spam Detection.” In Proceedings of the 16th ACM Conference on Web Science (WebSci `23), 2023.

[11] Jiang, D., Li, S., & Cao, X. “Multi-Modal Email Spam Detection with Text and Image Analysis.” Pattern Recognition, Vol.123, pp.108456, 2024.

[12] Zhu, T., Wang, Y., & Li, X. “Reinforcement Learning for Adaptive Email Spam Detection.” In Proceedings of the 2023 AAAI Conference on Artificial Intelligence (AAAI’23) 2023.

[13] Huang, X., Zhang, Y., & Wu, J. “Graph Neural Networks for Email Spam Detection.” In Proceedings of the 30th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD `24), 2024.

[14] Li, J., Zhao, Y., & Chen, T. “Adversarial Training for Robust Email Spam Detection.” In Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI `23), 2023.

[15] Zhang, F., Chen, Y., & Lin, X. “Multi-Task Learning for Email Spam Detection and Categorization.” In Proceedings of the 2024 SIAM International Conference on Data Mining (SDM `24), 2024.

[16] Chen, L., Wang, Z., & Li, Y. “Self-Supervised Learning for Email Spam Detection.” IEEE Transactions on Knowledge and Data Engineering, Vol.35, Issue.3, pp.1234-1246, 2023.

[17] Zhao, J., Liu, G., & Zheng, K. “Federated Learning for Privacy-Preserving Email Spam Detection.” IEEE Transactions on Information Forensics and Security, Vol.18, Issue.6, pp.1234–1245, 2023.

[18] Wu, L., Huang, Z., & Liu, Y. “Explainable Email Spam Detection with Attention-Based Deep Learning.” In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR `24), 2024.

[19] Kim, H., Lee, J., & Park, S. “Hybrid CNN-RNN Model for Effective Email Spam Classification.” IEEE Access, Vol.11, pp.12345-12356, 2023.

[20] Tang, Z., Xu, H., & Li, G. “Generative Adversarial Networks for Email Spam Data Augmentation.” In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR `23), 2023.

[21] Liu, Y., Zhang, F., & Wang, R. “Meta-Learning for Adaptive Email Spam Detection.” In Proceedings of the 2024 AAAI Conference on Artificial Intelligence (AAAI `24), 2024.

[22] Xie, T., Zhang, Y., & Cao, L. “Reinforcement Learning for Cost-Sensitive Email Spam Detection.” In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM `23), 2023.

[23] Wang, Y., Chen, X., & Zhou, J. “Ensemble Learning for Robust Email Spam Detection.” IEEE Transactions on Information Forensics and Security, Vol.19, Issue.2, pp.567–580, 2024.

[24] Gao, S., Zhou, Y., & Xu, D. “Integrating Text and Network Analysis for Robust Email Spam Detection.” In Proceedings of the 2023 Web Conference (WWW `23), 2023.

[25] Cheng, Y., Wu, Z., & Li, J. “Adaptive Deep Learning for Email Spam Detection with User Feedback.” In Proceedings of the 2024 International Conference on Machine Learning (ICML `24), 2024.

Downloads

Published

2025-02-28
CITATION
DOI: 10.26438/ijcse/v13i2.1627
Published: 2025-02-28

How to Cite

[1]
M. Stow and B.- ebi S. Ezonfa, “Two-Stage Email Classification Model for Enhanced Spam Filtering Through Feature Transformation and Iterative Learning”, Int. J. Comp. Sci. Eng., vol. 13, no. 2, pp. 16–27, Feb. 2025.

Issue

Section

Research Article