Automatic Image Caption Generation Using CNN, RNN and LSTM
DOI:
https://doi.org/10.26438/ijcse/v9i8.6062Keywords:
image annotation, deep learning, CNN, RNN, LSTM, python3, flaskAbstract
The paper aims at generating automated captions by learning the contents of the image. At present images are annotated with human intervention and it becomes nearly impossible task for huge commercial databases. The image database is given as input to a deep neural network (Convolutional Neural Network (CNN)) encoder for generating “thought vector” which extracts the features and nuances out of our image and RNN (Recurrent Neural Network) decoder is used to translate the features and objects given by our image to obtain sequential, meaningful description of the image .In this paper we are going to explain the survey about image captioning and our proposed system.
References
[1] Vinyals, Oriol, et al. Show and tell: A neural image caption generator. Proceedings of the IEEE conference on computer vision and pattern recognition, 2015.
[2] Deepak A Vidhate, Parag Kulkarni, 2019, International Journal of Computational Systems Engineering, Inderscience Publishers (IEL), Volume 5, Issue 3, pp 169-178.
[3] Fang, Hao, et al. From captions to visual concepts and back. Proceedings of the IEEE conference on computer vision and pattern recognition, 2015.
[4] Deepak A Vidhate, Parag Kulkarni, Information and Communication Technology for Intelligent Systems, Springer, Singapore, pp 693-703, 2019.
[5] Y. Bin, Y. Yang, F. Shen, X. Xu, and H. T. Shen, Bidirectional long short term memory for video description, in Proceedings of the 2016 ACM on Multimedia Conference. ACM, pp. 436440, 2016.
[6] Deepak A Vidhate, Parag Kulkarni, Communications in Computer and Information Science, Springer, Singapore, Volume 905, pp 352-361, 2018.
[7] K. Cho, A. Courville, and Y. Bengio, Describing multimedia content using attention-based encoder decoder networks, IEEE Transactions on Multimedia, vol.17, no. 11, pp. 18751886, 2015.
[8] Deepak A Vidhate, Parag Kulkarni, Smart Trends in Information Technology and Computer Communications. SmartCom 2017, Volume 876, pp 71-81, 2018.
[9] B. Qu, X. Li, D. Tao, and X. Lu, Deep semantic understanding of high resolution remote sensing image, in Proc. Int. Conf. Computational., Inf. Telecommunication. Syst., Jul.2016, pp. 15, 2016.
[10] X. Lu, B. Wang, X. Zheng, and X. Li, Exploring models and data for remote sensing image caption generation, IEEE Trans. Geosci. Remote Sens., vol. 56, no. 4, pp.21832195, Apr., 2018.
[11] X. Zhang, X. Wang, X.Tang, H.Zhou , and c.Li, Description generation for remote sensing images using attribute attention mechanism, Remote Sens., vol. 11, no. 6, p.612, 2019.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
