Image Caption Generation Using Deep Learning

Authors

  • Pawaskar S P Computer Dept., Goa College of Engineering, Goa University, Farmagudi, Ponda, Goa, India
  • Laxminarayana JA Computer Dept., Goa College of Engineering, Goa University, Farmagudi, Ponda, Goa, India

DOI:

https://doi.org/10.26438/ijcse/v6si10.5355

Keywords:

BLEU rating, captions, CNN, deep neural network

Abstract

From the perspective of humans and computers, a picture can be interpreted in distinct manner. In the case of humans, a picture will be clearly a few description or scene of a motion or environment and so forth, whilst with respect to computers, it is just a few aggregates of pixels or digital numbers. The system of photo captioning offers with assigning inner facts in the shape of captions with the aid of extracting the applicable functions from an input picture. This venture aims at producing meaningful captions for a given picture. The proposed work is based on deep neural networks. The proposed work has three fundamental units. The first is the picture module that is given as input to the function extractor unit. The next unit is a feature extractor unit based on CNN (Convolutional Neural Network) which extracts the applicable characteristic. The final unit is the language generator. It generates sentences that describe the input image. To assess the quality of the generated textual content, BLEU(Bi-Lingual Evaluation Understudy) rating is used. Suitable captions will help the users to search snapshots with lengthy queries. Such systems may also be beneficial for visually impaired humans in understanding pictures.

References

[1] Chen and C. L. Zitnick, "Mind`s eye: A recurrent visual representation for image caption generation," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),Boston, MA, 2015, pp. 2422-2431.

[2] J. Donahue et al., "Long-Term Recurrent Convolutional Networks for Visual Recognition and Description," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, Issue 4, pp. 677-691, April 1 2017.

[3] V. B. Kumar, T. R. Baadkar, and V. Joshi, "CRYPTANITE: A New Look to the World of Social Networks Using Deep Learning," 2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS),Naples, 2016, pp. 358-364.

[4] Arnab Ghoshal, Pavel Ircing, Sanjeev Khudanpur "Hidden Markov Models for Automatic Annotation and Content-Based Retrieval of Images and Video".

[5] Zajic R. Schwartz, D & Door, B & Schwartz, Richard "Automatic Headline Generation for Newspaper Stories", 2018.

[6] PHILO SUMI , ANU.T.P " A Systematic Approach for News aption Generation", International Journal of Advanced Research in Computer Science & Technology (IJARCST 2014), Vol. 2, Issue 2, Ver. 1 (April -June 2014).

[7] K. Ramnath et al., "AutoCaption: Automatic caption generation for personal photos," IEEE Winter Conference on Applications of Computer Vision, Steamboat Springs, CO, 2014, pp. 1050-1057.

Downloads

Published

2025-11-17
CITATION
DOI: 10.26438/ijcse/v6si10.5355
Published: 2025-11-17

How to Cite

[1]
S. Pawaskar and J. Laxminarayana, “Image Caption Generation Using Deep Learning”, Int. J. Comp. Sci. Eng., vol. 6, no. 10, pp. 53–55, Nov. 2025.