Google Landmarks Dataset v2 -- A Large-Scale Benchmark for Instance-Level Recognition and Retrieval

Weyand, Tobias; Araujo, Andre; Cao, Bingyi; Sim, Jack

Computer Science > Computer Vision and Pattern Recognition

arXiv:2004.01804 (cs)

[Submitted on 3 Apr 2020 (v1), last revised 2 Nov 2020 (this version, v2)]

Title:Google Landmarks Dataset v2 -- A Large-Scale Benchmark for Instance-Level Recognition and Retrieval

Authors:Tobias Weyand, Andre Araujo, Bingyi Cao, Jack Sim

View PDF

Abstract:While image retrieval and instance recognition techniques are progressing rapidly, there is a need for challenging datasets to accurately measure their performance -- while posing novel challenges that are relevant for practical applications. We introduce the Google Landmarks Dataset v2 (GLDv2), a new benchmark for large-scale, fine-grained instance recognition and image retrieval in the domain of human-made and natural landmarks. GLDv2 is the largest such dataset to date by a large margin, including over 5M images and 200k distinct instance labels. Its test set consists of 118k images with ground truth annotations for both the retrieval and recognition tasks. The ground truth construction involved over 800 hours of human annotator work. Our new dataset has several challenging properties inspired by real world applications that previous datasets did not consider: An extremely long-tailed class distribution, a large fraction of out-of-domain test photos and large intra-class variability. The dataset is sourced from Wikimedia Commons, the world's largest crowdsourced collection of landmark photos. We provide baseline results for both recognition and retrieval tasks based on state-of-the-art methods as well as competitive results from a public challenge. We further demonstrate the suitability of the dataset for transfer learning by showing that image embeddings trained on it achieve competitive retrieval performance on independent datasets. The dataset images, ground-truth and metric scoring code are available at this https URL.

Comments:	CVPR20 camera-ready (oral) + appendices
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2004.01804 [cs.CV]
	(or arXiv:2004.01804v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2004.01804

Submission history

From: Andre Araujo [view email]
[v1] Fri, 3 Apr 2020 22:52:17 UTC (486 KB)
[v2] Mon, 2 Nov 2020 18:30:45 UTC (9,352 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Google Landmarks Dataset v2 -- A Large-Scale Benchmark for Instance-Level Recognition and Retrieval

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Google Landmarks Dataset v2 -- A Large-Scale Benchmark for Instance-Level Recognition and Retrieval

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators