Spam Detection Approach Using Modified Pre-processing With NLP
Keywords:
Spam detection, email, NLP, spam classificationAbstract
However, the growth in emails has also led to an unprecedented increase in the number of illegitimate mail, or spam 49.7% of emails sent is spam - because current spam detection methods lack an accurate spam classifier. We are excited by the decline in the volume of email spam but it also raises the question as to whether the email spam business is dying and will continue to decline. Besides the volume change, we also consider the quality of email spam and the impact, which may constitute a new trend of email spam business. For instance, spammers may post email spam in a more complicated way using spoofed email addresses and changing email relay servers. That kind of email spam may slip away under the inspection of spam filters. Thus, it motivated us to investigate the evolution of email spam using advanced techniques such as topic modelling and network analysis. We try to find out the real trend of email spam business through email content, meta information such as headers, and sender-to-receiver network over a long period of time.
References
[1] A. Bhowmick and S. Hazarika, “Machine learning for e-mail spam filtering: review, techniques and trends,” https://arxiv.org/abs/1606.0104, 2016, accessed: 2017.
[2] A. Aski and N. Sourti, “Proposed efficient algorithm to filter spam using machine,” in Pacific Science Review A: Natural Science and Engineering, vol. 18, 2016, pp. 145–149.
[3] J. Rao and D. Reilly, “The economics of spam,” in Journal of Economic Perspectives, vol. 26, no. 3, 2012.
[4] H. Tschabitscher, “How many emails are sent every day?” https://www.lifewire.com/how-many-emails-are-sent-every-day-117121, 2017, accessed: 2017.
[5] J.S. Kong, P.O. Boykin, B.A. Rezaei, N. Sarshar, and V.P. Roy chowdhury, “Let Your Cyber Alter Ego Share Information and Manage Spam,” Univ. of California, Los Angeles, CA, technical report,2005.
[6] F. Zhou, L. Zhuang, B.Y. Zhao, L. Huang, A.D. Joseph, and J.D. Kubiatowicz, “Approximate Object Location, and Spam Filtering on Peer-to-Peer Systems,” Proc. Middleware, pp. 1–20, 2003.
[7] SPAMNET, http://www.cloudmark.com, accessed in Mar. 2014.
[8] Haiying Shen, Senior Member, IEEE, and Ze Li, Student Member, IEEE, “Leveraging Social Networks for Effective Spam Filtering”, IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 11, NOVEMBER 2014.
[9] Dr Devendra K. Tayal, Amita Jain, Kanak Meena,” Development of Anti-spam techniques using modified K-means & Naive Bayes Algorithms” IEEE-2016.
[10] Weimiao Feng, Jianguo Sun, Qing Yang, “A Support Vector Machine based Naive Bayes Algorithm for Spam Filtering”, IEEE-2016.
[11] Rohit Kumar Solanki, Karun Verma, Ravinder Kumar,” Spam Filtering Using Hybrid Local-Global Naive Bayes Classifier” IEEE-2015.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
