MedICaT: A Dataset of Medical Images, Captions, and Textual References

Subramanian, Sanjay; Wang, Lucy Lu; Mehta, Sachin; Bogin, Ben; van Zuylen, Madeleine; Parasa, Sravanthi; Singh, Sameer; Gardner, Matt; Hajishirzi, Hannaneh

Computer Science > Computer Vision and Pattern Recognition

arXiv:2010.06000 (cs)

[Submitted on 12 Oct 2020]

Title:MedICaT: A Dataset of Medical Images, Captions, and Textual References

Authors:Sanjay Subramanian, Lucy Lu Wang, Sachin Mehta, Ben Bogin, Madeleine van Zuylen, Sravanthi Parasa, Sameer Singh, Matt Gardner, Hannaneh Hajishirzi

View PDF

Abstract:Understanding the relationship between figures and text is key to scientific document understanding. Medical figures in particular are quite complex, often consisting of several subfigures (75% of figures in our dataset), with detailed text describing their content. Previous work studying figures in scientific papers focused on classifying figure content rather than understanding how images relate to the text. To address challenges in figure retrieval and figure-to-text alignment, we introduce MedICaT, a dataset of medical images in context. MedICaT consists of 217K images from 131K open access biomedical papers, and includes captions, inline references for 74% of figures, and manually annotated subfigures and subcaptions for a subset of figures. Using MedICaT, we introduce the task of subfigure to subcaption alignment in compound figures and demonstrate the utility of inline references in image-text matching. Our data and code can be accessed at this https URL.

Comments:	EMNLP-Findings 2020
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:2010.06000 [cs.CV]
	(or arXiv:2010.06000v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2010.06000

Submission history

From: Sanjay Subramanian [view email]
[v1] Mon, 12 Oct 2020 19:56:08 UTC (9,177 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MedICaT: A Dataset of Medical Images, Captions, and Textual References

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MedICaT: A Dataset of Medical Images, Captions, and Textual References

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators