Image Caption Generation: A Comprehensive Survey
DOI:
https://doi.org/10.26438/ijcse/v6i3.230234Keywords:
Automatic image captioning, Deep CNN, Hidden Markov Model, LSTM, Neural Network, RNNAbstract
From the viewpoint of humans and computers, images could be interpreted in different ways. In case of humans, an image could be simply some description or scene of an action or environment etc.; while with respect to computers, it is just some combination of pixels or digital numbers. The process of Image Captioning deals with assigning internal data in the form of captions or keywords to a digital image. This paper is a comprehensive survey of different methodologies to generate appropriate image captions. Here, we have compared various approaches available for implementation of image captioning. We have also described the evaluation metrics that could be used by such systems. Appropriate captions will assist the users to search images with long queries. Automatic image captioning could also be useful for visually impaired people in understanding pictures.
References
Moses Soh, "Learning CNN-LSTM Architectures for Image Caption Generation ", 2016.
Mathews, Alexander & Xie, Lexing & He, Xuming, " SentiCap: Generating Image Descriptions with Sentiments", 2015.
Jianhui Chen, Wenqiang Dong, Minchen Li, "Image Caption Generator Based On Deep Neural Networks".
X. Chen and C. L. Zitnick, "Mind's eye: A recurrent visual representation for image caption generation," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 2422-2431.
J. Donahue et al., "Long-Term Recurrent Convolutional Networks for Visual Recognition and Description," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, Issue 4, pp. 677-691, April 1 2017.
H. Fang et al., "From captions to visual concepts and back," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 1473-1482.
A. Karpathy and L. Fei-Fei, "Deep Visual-Semantic Alignments for Generating Image Descriptions," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 4, pp. 664-676, April 1, 2017.
V. B. Kumar, T. R. Baadkar, and V. Joshi, "CRYPTANITE: A New Look to the World of Social Networks Using Deep Learning," 2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), Naples, 2016, pp. 358-364.
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and tell: A neural image caption generator," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 3156-3164.
Geetika, Tulsi Jain, “Discriminatory Image Caption Generation Based on Recurrent Neural Networks and Ranking Objective”, International Journal of Computer Sciences and Engineering, Vol. 5, Issue.10, pp.260-265, 2017.
Arnab Ghoshal, Pavel Ircing, Sanjeev Khudanpur "Hidden Markov Models for Automatic Annotation and Content-Based Retrieval of Images and Video".
Zajic R. Schwartz, D & Door, B & Schwartz, Richard "Automatic Headline Generation for Newspaper Stories", 2018.
PHILO SUMI , ANU.T.P " A Systematic Approach for News Caption Generation", International Journal of Advanced Research in Computer Science & Technology (IJARCST 2014), Vol. 2, Issue 2, Ver. 1 (April - June 2014)
K. Ramnath et al., "AutoCaption: Automatic caption generation for personal photos," IEEE Winter Conference on Applications of Computer Vision, Steamboat Springs, CO, 2014, pp. 1050-1057.
K. Shivdikar, A. Kak, and K. Marwah, "Automatic image annotation using a hybrid engine," 2015 Annual IEEE India Conference (INDICON), New Delhi, 2015, pp. 1-6.
D. J. Kim, D. Yoo, B. Sim and I. S. Kweon, "Sentence learning on deep convolutional networks for image Caption Generation," 2016 13th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), Xi'an, 2016, pp. 246-247.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
