Multimodal Emotion Recognition using Deep Neural Network- A Survey

Authors

Haritha CV Department of Computer Science and Engineering, N.S.S College of Engineering, Palakkad
Thulasidharan PP Department of Computer Science and Engineering, N.S.S College of Engineering, Palakkad

DOI:

https://doi.org/10.26438/ijcse/v6si6.9598

Keywords:

DCNN, DBN, LSTM, SVR

Abstract

Emotion recognition is a process by which human emotional states can be identified. Most of the present methods make use of visual and audio information’s together. With recent advancements in deep neural networking, there are several methodologies to identify human emotional states. One of the methods that detect the emotional states is based on a multimodal Deep Convolution Neural Network (DCNN), that use both the audio and visual cues in a deep model. BLSTMRNN is another method which makes use of multimodal features to capture emotions. A much more efficient approach is using a convolutional neural network (CNN) to extract features from the speech, and for the visual modality, the features can be extracted using a deep residual network of 50 layers. To capture contextual information’s a long short-term memory network can be utilized above these two models. Deep belief networks are another method which takes multimodal emotion recognition into account by first learning the features of the audio and video separately; after which it concatenates these two features. Visual features hold more importance in emotion recognition, so ResNet along with SVR for training can be used to predict emotion states effectively.

References

[1]Y. Wang and L. Guan, " Recognizing human emotional state from audio-visual signals", IEEE Trans. Multimedia., pp:936–946, 2008.

[2]A. Hanjalic and L. Xu, "Affective video content representation and modelling", IEEE Trans. Multimedia., pp: 143–154, 2005.

[3]Y. Cao, Y. Chen, and D. Khosla, "Spiking deep convolutional neural networks for energy-efficient object recognition”, Int. J. Comput. Vis., pp:54–66, 2015

[4]W. Ouyang, X. Wang, X. Zeng, S. Qiu, P. Luo, Y. Tian, H. Li, S. Yang, Z. Wang, C.-C. Loy, et al., ”Deepid-net: Deformable deep convolutional neural networks for object Detection”, In CVPR, 2015.

[5]S. Zhang, S. Zhang, T. Huang, and W. Gao, “Multimodal deep convolutional neural network for audio-visual emotion recognition,” in Proc. Int. Conf. Multimedia Retrieval, pp. 281–284, 2016.

[6]A. Krrizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks", In NIPS, 2012.

[7]S. Hochreiter and J. Schmidhuber, ”Long short-term memory,” NeuralComput., pp. 1735-1780, 1997.

[8]Xiong, X., De la Torre, F., “Supervised descent method and its applications to face alignment”, in: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 532–539,

2013

[9]Farneb¨ack, G, ” Two-frame motion estimation based on polynomial expansion, in: Image Analysis”, in Springer, pp. 363–370, 2003.

[10]Schuster, M., Paliwal, K.K., “Bidirectional recurrent neural networks” IEEE Trans. on Signal Processing 45, pp:2673–2681, 1997.

[11]K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proc. Conf. Comput. Vis. Pattern Recognit, pp. 770–778, 2016.

[12]F. Ringeval, A. Sonderegger, J. Sauer, and D. Lalanne, ”Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions”, IEEE Int. Conf. Workshops Automat. Face Gesture Recognit., pp. 1–8, 2013.

[13]Y. Bengio, “Learning deep architectures for AI,” Foundations and Trends in Machine Learning, pp. 1–127, 2009.

[14]J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A.Y. Ng,

“Multimodal deep learning,” in Proceedings of the 28th International Conference on Machine Learning (ICML), pp. 689–696, 2011

[15] F. Ringeval et al., “Prediction of asynchronous dimensional emotion ratings from audio visual and physiological data,” Pattern Recognit. Lett., pp. 22–30, 2015.

[16]Panagiotis Tzirakis, George Trigeorgis, Mihalis A. Nicolaou, Bjorn W.Schuller, and Stefanos Zafeiriou, "End-to-End Multimodal Emotion Recognition Using Deep Neural networks", in IEEE Journal of Selected Topics in Signal Processing, pp: 1301 - 1309, 2017.

[17]Y. Kim, H. Lee, and E. M. Provost, “Deep learning for robust feature generation in audiovisual emotion recognition,” in Proc. Int. Conf. Acoust., Speech, Signal Process., pp. 3687–369, 2013

[18]B. Sun, S. Cao, L. Li, J. He, and L. Yu, “Exploring multimodal visual features for continuous affect recognition,” in Proc. 6th Int. Workshop Audio/Visual Emotion Challenge, Amsterdam, pp. 83–88, 2016.

Downloads

PDF ⁰

Published

2018-07-31

CITATION

DOI: 10.26438/ijcse/v6si6.9598

Published: 2018-07-31

How to Cite

[1]

C. Haritha and P. P. Thulasidharan, “Multimodal Emotion Recognition using Deep Neural Network- A Survey”, Int. J. Comp. Sci. Eng., vol. 6, no. 6, pp. 95–98, Jul. 2018.

Download Citation

Issue

Vol. 6 No. 6 (2018): IJCSE Special Issue July Edition

Section

Survey Article

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.

Multimodal Emotion Recognition using Deep Neural Network- A Survey

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Journal Information

UGC Gazette Regulation

Join Editorial Board

Information

Current Issue

Keywords