CNN-LSTM-BASED DEEP LEARNING FOR AUTOMATIC IMAGE CAPTIONING
DOI:
https://doi.org/10.56238/arev6n3-145Keywords:
Machine Learning, Deep Learning, Convolutional Neural Networks, Long Short-Term Memory, Image SubtitlingAbstract
The evolution of Computer Vision and Machine Learning allows natural language image description techniques to be more efficient and accurate, through deep neural networks. This study used an encoder-decoder structure for object identification and captioning, through an input image. The proposed model used the VGG16 and Inception-V3 architectures as encoders and LSTM as decoder. To carry out the experiments, the Flickr8k dataset was used, with 8,000 images. The model was evaluated by the Bleu, Meteor, CIDEr and Rouge metrics. Achieving 58.40% accuracy according to the Bleu metric, thus ensuring human-understandable descriptions.
Downloads
Download data is not yet available.
Downloads
Published
2024-11-13
Issue
Section
Articles
How to Cite
RIBEIRO, Maria Vitória Sousa; NOGUEIRA, Tiago do Carmo; JUNIOR, Gelson da Cruz; VINHAL, Cássio Dener Noronha; ULLMANN, Matheus Rudolfo Diedrich; FERREIRA, Deller James; CARVALHO, Caio Henrique Rodrigues; SANTANA, Danyele de Oliveira. CNN-LSTM-BASED DEEP LEARNING FOR AUTOMATIC IMAGE CAPTIONING. ARACÊ , [S. l.], v. 6, n. 3, p. 6725–6749, 2024. DOI: 10.56238/arev6n3-145. Disponível em: https://periodicos.newsciencepubl.com/arace/article/view/1339. Acesso em: 8 mar. 2025.