Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video

Liu, Miao; Tang, Siyu; Li, Yin; Rehg, James

Computer Science > Computer Vision and Pattern Recognition

arXiv:1911.10967 (cs)

[Submitted on 25 Nov 2019 (v1), last revised 20 Jul 2020 (this version, v2)]

Title:Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video

Authors:Miao Liu, Siyu Tang, Yin Li, James Rehg

View PDF

Abstract:We address the challenging task of anticipating human-object interaction in first person videos. Most existing methods ignore how the camera wearer interacts with the objects, or simply consider body motion as a separate modality. In contrast, we observe that the international hand movement reveals critical information about the future activity. Motivated by this, we adopt intentional hand movement as a future representation and propose a novel deep network that jointly models and predicts the egocentric hand motion, interaction hotspots and future action. Specifically, we consider the future hand motion as the motor attention, and model this attention using latent variables in our deep model. The predicted motor attention is further used to characterise the discriminative spatial-temporal visual features for predicting actions and interaction hotspots. We present extensive experiments demonstrating the benefit of the proposed joint model. Importantly, our model produces new state-of-the-art results for action anticipation on both EGTEA Gaze+ and the EPIC-Kitchens datasets. Our project page is available at this https URL

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1911.10967 [cs.CV]
	(or arXiv:1911.10967v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1911.10967

Submission history

From: Miao Liu [view email]
[v1] Mon, 25 Nov 2019 15:10:20 UTC (1,297 KB)
[v2] Mon, 20 Jul 2020 01:58:19 UTC (1,212 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators