CNN-LSTM-BASED DEEP LEARNING FOR AUTOMATIC IMAGE CAPTIONING

Authors

  • Maria Vitória Sousa Ribeiro Author
  • Tiago do Carmo Nogueira Author
  • Gelson da Cruz Junior Author
  • Cássio Dener Noronha Vinhal Author
  • Matheus Rudolfo Diedrich Ullmann Author
  • Deller James Ferreira Author
  • Caio Henrique Rodrigues Carvalho Author
  • Danyele de Oliveira Santana Author

DOI:

https://doi.org/10.56238/arev6n3-145

Keywords:

Machine Learning, Deep Learning, Convolutional Neural Networks, Long Short-Term Memory, Image Subtitling

Abstract

The evolution of Computer Vision and Machine Learning allows natural language image description techniques to be more efficient and accurate, through deep neural networks. This study used an encoder-decoder structure for object identification and captioning, through an input image. The proposed model used the VGG16 and Inception-V3 architectures as encoders and LSTM as decoder. To carry out the experiments, the Flickr8k dataset was used, with 8,000 images. The model was evaluated by the Bleu, Meteor, CIDEr and Rouge metrics. Achieving 58.40% accuracy according to the Bleu metric, thus ensuring human-understandable descriptions.

Downloads

Download data is not yet available.

Published

2024-11-13

Issue

Section

Articles

How to Cite

RIBEIRO, Maria Vitória Sousa; NOGUEIRA, Tiago do Carmo; JUNIOR, Gelson da Cruz; VINHAL, Cássio Dener Noronha; ULLMANN, Matheus Rudolfo Diedrich; FERREIRA, Deller James; CARVALHO, Caio Henrique Rodrigues; SANTANA, Danyele de Oliveira. CNN-LSTM-BASED DEEP LEARNING FOR AUTOMATIC IMAGE CAPTIONING. ARACÊ , [S. l.], v. 6, n. 3, p. 6725–6749, 2024. DOI: 10.56238/arev6n3-145. Disponível em: https://periodicos.newsciencepubl.com/arace/article/view/1339. Acesso em: 8 mar. 2025.