CNN+CNN: Convolutional Decoders for Image Captioning

Wang, Qingzhong; Chan, Antoni B.

Computer Science > Computer Vision and Pattern Recognition

arXiv:1805.09019 (cs)

[Submitted on 23 May 2018]

Title:CNN+CNN: Convolutional Decoders for Image Captioning

Authors:Qingzhong Wang, Antoni B. Chan

View PDF

Abstract:Image captioning is a challenging task that combines the field of computer vision and natural language processing. A variety of approaches have been proposed to achieve the goal of automatically describing an image, and recurrent neural network (RNN) or long-short term memory (LSTM) based models dominate this field. However, RNNs or LSTMs cannot be calculated in parallel and ignore the underlying hierarchical structure of a sentence. In this paper, we propose a framework that only employs convolutional neural networks (CNNs) to generate captions. Owing to parallel computing, our basic model is around 3 times faster than NIC (an LSTM-based model) during training time, while also providing better results. We conduct extensive experiments on MSCOCO and investigate the influence of the model width and depth. Compared with LSTM-based models that apply similar attention mechanisms, our proposed models achieves comparable scores of BLEU-1,2,3,4 and METEOR, and higher scores of CIDEr. We also test our model on the paragraph annotation dataset, and get higher CIDEr score compared with hierarchical LSTMs

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1805.09019 [cs.CV]
	(or arXiv:1805.09019v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1805.09019

Submission history

From: Qingzhong Wang [view email]
[v1] Wed, 23 May 2018 09:16:59 UTC (3,379 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2018-05

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Qingzhong Wang
Antoni B. Chan

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:CNN+CNN: Convolutional Decoders for Image Captioning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:CNN+CNN: Convolutional Decoders for Image Captioning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators