IMDb Movie Data Classification using Voting Classifier for Sentiment Analysis

Authors

  • Kaushik K Department of Computer Science, Madhav Institute of Technology and Science, Gwalior India
  • Parmar M Department of Computer Science, Madhav Institute of Technology and Science, Gwalior India

DOI:

https://doi.org/10.26438/ijcse/v10i1.1823

Keywords:

Sentiment Analysis, Feature Extraction, Voting classifier, Machine Learning, IMDb data

Abstract

Social networking sites have become popular and common places in which short texts share emotional diversity. These emotions are sadness, happiness, fear, anxiety, and so on. In order to identify sentiments expressed by the crowd, it helps in analyzing short texts. On IMDb movie reviews, sentiment analysis identifies a reviewer's overall sentiment or opinion on a movie. We worked on the IMDb movie dataset in this paper. which was retrieved from Kaggle which was crawled and labelled positive/negative. The available dataset consists of emoticons, Id, Data, Query, username and converted into a standard from. We get these results by utilizing a Voting Classifier with Logistic Regression & Random Forest, which is a traditional machine learning algorithm. Furthermore, the results of these algorithms were compared using five evaluation criteria. metrics – accuracy(89.34), precision(88.71), recall(90.35),  F1 measure(89.52), and Area under Curve (89.33).

References

[1] Tajinder singh, Madhu Kumari, “Role of Text Pre-Processing in Twitter Sentiment Analysis”, Procedia Computer Science 89 (2016), pp.549-554.

[2] Isha Gandhi, Mrinal Pandey, “Hybrid Ensemble of Classifiers using Voting”, Green Computing and Internet of Things (ICGCIoT), 2015, DOI: 10.1109/ICGCIoT.2015.7380496.

[3] Bin Lu, K.T. Benjamin,” Combining A Large Sentiment Lexicon And Machine Learning For Subjectivity Classification”, Machine Learning and Cybernetics (ICMLC), 2010,DOI: 10.1109/ICMLC.2010.5580672.

[4] Zamahsyari, Arif Nurwidyantoro, “Sentiment Analysis of Economic News in Bahasa Indonesia Using Majority Vote Classifier”, Data and Software Engineering (ICoDSE), 2016, DOI:10.1109/ICODSE.2016.7936123.

[5] Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, Christopher Pos, “Learning Word Vectors for Sentiment Analysis”,2011 In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Portland, Oregon, USA, 142–150. http://www.aclweb.org/anthology/P11-1015

[6] Kumar, V., & Subba, B. (2020). A TfidfVectorizer and SVM based sentiment analysis framework for text data corpus. 2020 National Conference on Communications (NCC). doi:10.1109/ncc48643.2020.9056085

[7] Mahmud, Q. I., Mohaimen, A., Islam, M. S., & Marium-E-Jannat. (2017). A support vector machine mixed with statistical reasoning approach to predict movie success by analyzing public sentiments. 2017 20th International Conference of Computer and Information Technology (ICCIT). doi:10.1109/iccitechn.2017.8281803

[8] Xu, G., Yu, Z., Yao, H., Li, F., Meng, Y., & Wu, X. (2019). Chinese Text Sentiment Analysis Based on Extended Sentiment Dictionary. IEEE Access, 7, 43749–43762. doi:10.1109/access.2019.2907772

[9] Sahu, T. P., & Ahuja, S. (2016). Sentiment analysis of movie reviews: A study on feature selection & classification algorithms. 2016 International Conference on Microelectronics, Computing and Communications (MicroCom). doi:10.1109/microcom.2016.7522583

[10] Hourrane, O., Idrissi, N., & Benlahmar, E. H. (2019). An Empirical Study of Deep Neural Networks Models for Sentiment Classification on Movie Reviews. 2019 1st International Conference on Smart Systems and Data Science (ICSSD). doi:10.1109/icssd47982.2019.9003171

[11] Manjunath, D. R., & Hadimani, B. S. (2019). Hierarchical Clustering and Regression Classification based Review analysis on Movie based Applications. 2019 1st International Conference on Advanced Technologies in Intelligent Control, Environment, Computing & Communication Engineering (ICATIECE). doi:10.1109/icatiece45860.2019.9063861

[12] Gladence L, Karthi M, Anu V. A statistical comparison of logistic regression and different Bayes classification methods for machine learning. ARPN J Eng Appl Sci. 2015;10(14):5947–53.

[13] Parmar, A., Katariya, R., & Patel, V. (2018). A Review on Random Forest: An EnsembleClassifier. Lecture Notes on Data Engineering and Communications Technologies,758–763. doi:10.1007/978-3-030-03146-6_86.

[14] https://github.com/jalbertbowden/large-movie-reviews-dataset/tree/master/acl-imdb-v1

Downloads

Published

2022-01-31
CITATION
DOI: 10.26438/ijcse/v10i1.1823
Published: 2022-01-31

How to Cite

[1]
K. Kaushik and M. Parmar, “IMDb Movie Data Classification using Voting Classifier for Sentiment Analysis”, Int. J. Comp. Sci. Eng., vol. 10, no. 1, pp. 18–23, Jan. 2022.

Issue

Section

Research Article