Focal Inverse Distance Transform Maps for Crowd Localization

Liang, Dingkang; Xu, Wei; Zhu, Yingying; Zhou, Yu

doi:10.1109/TMM.2022.3203870

Computer Science > Computer Vision and Pattern Recognition

arXiv:2102.07925 (cs)

[Submitted on 16 Feb 2021 (v1), last revised 4 Sep 2022 (this version, v3)]

Title:Focal Inverse Distance Transform Maps for Crowd Localization

Authors:Dingkang Liang, Wei Xu, Yingying Zhu, Yu Zhou

View PDF

Abstract:In this paper, we focus on the crowd localization task, a crucial topic of crowd analysis. Most regression-based methods utilize convolution neural networks (CNN) to regress a density map, which can not accurately locate the instance in the extremely dense scene, attributed to two crucial reasons: 1) the density map consists of a series of blurry Gaussian blobs, 2) severe overlaps exist in the dense region of the density map. To tackle this issue, we propose a novel Focal Inverse Distance Transform (FIDT) map for the crowd localization task. Compared with the density maps, the FIDT maps accurately describe the persons' locations without overlapping in dense regions. Based on the FIDT maps, a Local-Maxima-Detection-Strategy (LMDS) is derived to effectively extract the center point for each individual. Furthermore, we introduce an Independent SSIM (I-SSIM) loss to make the model tend to learn the local structural information, better recognizing local maxima. Extensive experiments demonstrate that the proposed method reports state-of-the-art localization performance on six crowd datasets and one vehicle dataset. Additionally, we find that the proposed method shows superior robustness on the negative and extremely dense scenes, which further verifies the effectiveness of the FIDT maps. The code and model will be available at this https URL.

Comments:	Accepted by IEEE Transactions on Multimedia (TMM). The code and models are available at this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2102.07925 [cs.CV]
	(or arXiv:2102.07925v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2102.07925
Related DOI:	https://doi.org/10.1109/TMM.2022.3203870

Submission history

From: Dingkang Liang [view email]
[v1] Tue, 16 Feb 2021 02:25:55 UTC (17,427 KB)
[v2] Thu, 18 Mar 2021 11:45:10 UTC (21,336 KB)
[v3] Sun, 4 Sep 2022 03:45:09 UTC (24,112 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Focal Inverse Distance Transform Maps for Crowd Localization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Focal Inverse Distance Transform Maps for Crowd Localization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators