Microblog Dimensionality Reduction With Semantic Analysis
DOI:
https://doi.org/10.26438/ijcse/v6i1.342346Keywords:
Microbloging, Accessibility, Sentiment Classfication, Latent Semantic AnalysisAbstract
Much attention in recent years has been attracted by the process exploring useful information from a large amount of textual data produced by microblogging services such as Twitter. A very important preprocessing step is to convert natural language texts of microblog text mining into proper numerical representations. The short-length characteristics of microblog texts result in using the term frequency vectors to represent microblog texts and it will cause “sparse data” problem. Finding proper representations for microblog texts is a challenging issue.In the previous paper, they applied deep networks so that they can map the high-dimensional representations to low-dimensional representations.The retweet and hashtags have been used as the semantic similarity. They used two types of approaches which includes modifying the training data and modifying the training objective. They have also shown that deep models perform better than traditional methods such as latent Dirichlet allocation topic model and latent semantic analysis.
References
Lei Xu, Chunxiao Jiang,“Microblog Dimensionality Reduction—A Deep Learning Approach,” Ieee Transactions On Knowledge And Data Engineering, Vol. 28, No. 7, July 2016.
Zhi-Qiang Xian , “Sentiment Analysis of Chinese Micro-blog Using Vector Space Model,” APSIPA,2014.
Amit mittal , “Social Networking text Classification in Big Data Environment,” IJlEET, 2016
X. Yan and H. Zhao, “Chinese microblog topic detection based on the latent semantic analysis and structural property,” J. Netw., vol. 8, pp. 917–9233, no. 4, 2013.
D. Ramage, S. T. Dumais, and D. J. Liebling, “Characterizing microblogs with topic models,” in Proc. 4th Int. Conf. Weblogs Social Media, pp. 130–137, 2010.
O. Jin, N. N. Liu, K. Zhao, Y. Yu, and Q. Yang, “Transferring topical knowledge from auxiliary long texts for short text clustering,” in Proc. 20th ACM Int. Conf. Inf. Knowl. Manag., pp. 775–784, 2011.
Q. Diao, J. Jiang, F. Zhu, and E.-P. Lim, “Finding bursty topics from microblogs,” in Proc. 50th Annu. Meet. Assoc. Comput. Linguistics: Long Papers-Vol. 1. , pp. 536–544, 2012.
M. A. Ranzato and M. Szummer, “Semi-supervised learning of compact document representations with deep networks,” in Proc. 25th Int. Conf. Mach. Learning, pp. 792–799, 2008.
G. Hinton and R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Sci., vol. 313, no. 5786, pp. 504–507, Jul. 2006.
R. Salakhutdinov and G. Hinton, “Semantic hashing,” Int. J. Approx. Reasoning, vol. 50, no. 7, pp. 969–978, Jul. 2009.
M. A. Ranzato and M. Szummer, “Semi-supervised learning of compact document representations with deep networks,” in Proc. 25th Int. Conf. Mach. Learning, pp. 792–799, 2008.
S. Zhou, Q. Chen, and X. Wang, “Active deep learning method for semi-supervised sentiment classification,” Neurocomputing, vol. 120, pp. 536–546, 2013.
M. R. Min, L. Maaten, Z. Yuan, A. J. Bonner, and Z. Zhang, “Deep supervised t-distributed embedding,” in Proc. 27th Int. Conf. Mach. Learn. , pp. 791–798, 2010.
D. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” J. Mach. Learning Res., vol. 3, pp. 993–1022, 2001.
T. K. Landauer, P. W. Foltz, and D. Laham, “An introduction to latent semantic analysis,” Discourse Processes, vol. 25, pp. 259–284, 1998.
T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” CoRR, vol. abs/1301.3781, 2013.
A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, “Learning word vectors for sentiment analysis,” in Proc. 49th Annu. Meet. Assoc. Comput. Linguistics: Human Language Technol.-Volume 1., pp. 142–150,2011.
J. Tang, X. Wang, H. Gao, X. Hu, and H. Liu, “Enriching short text representation in microblog for clustering,” Frontiers Comput. Sci., vol. 6, no. 1, pp. 88–101, 2012.
Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei, “Sharing clusters among related groups: Hierarchical dirichlet processes,” in Proc. Int. Conf. Neural Information Processing Syst, pp. 1385– 1392, 2004.
C. E. Grant, C. P. George, C. Jenneisch, and J. N. Wilson, “Online topic modeling for real-time twitter search,” in Proc. Text Retrieval Conf. , pp. 1–9, 2011.
X. Wang, F. Zhu, J. Jiang, and S. Li, “Real time event detection in twitter,” in Proc. 14th Int. Conf. Web-Age Inf. Manag., pp. 502–513, 2013.
D. Yu and L. Deng, “Deep learning and its applications to signal and information processing,” IEEE Signal Process. Mag., vol. 28, no. 1, pp. 145–154, Jan. 2011.
Y. Bengio, A. C. Courville, and P. Vincent, “Unsupervised feature learning and deep learning: A review and new perspectives,”CoRR, vol. abs/1206.5538, 2012.
R. Collobert and J. Weston, “A unified architecture for natural language processing: Deep neural networks with multitask learning,” in Proc. 25th Int. Conf. Mach. Learning, pp. 160–167,2008.
J. P. Turian, L.-A. Ratinov, and Y. Bengio, “Word representations:A simple and general method for semi-supervised learning,” in Proc. 48th Annu. Meet. Assoc. Comput. Linguistics, pp. 384–394, 2010.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
