Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

Hangjie Yuan; Mang Wang; Dong Ni; Liangpeng Xu

Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

Hangjie Yuan, Mang Wang, Dong Ni, Liangpeng Xu

[AAAI-22] Main Track

Keywords
Poster Session 5 @ Red 3, Poster Session 12 @ Red 3, Poster Session 5, Poster Session 12

Download Paper

Enter the Virtual Venue

Abstract: Human-Object Interaction (HOI) detection is an essential task to understand human-centric images from a fine-grained perspective. Although end-to-end HOI detection models thrive, their paradigm of parallel human/object detection and verb class prediction loses two-stage methods' merit: object-guided hierarchy. The object in one HOI triplet gives direct clues to the verb to be predicted. In this paper, we aim to boost end-to-end models with object-guided statistical priors. Specifically, We propose to utilize a Verb Semantic Model (VSM) and use semantic aggregation to profit from this object-guided hierarchy. Similarity KL (SKL) loss is proposed to optimize VSM to align with the HOI dataset's priors. To overcome the static semantic embedding problem, we propose to generate cross-modality-aware visual and semantic features by Cross-Modal Calibration (CMC). The above modules combined composes Object-guided Cross-modal Calibration Network (OCN). Experiments conducted on two popular HOI detection benchmarks demonstrate the significance of incorporating the statistical prior knowledge and produce state-of-the-art performances. More detailed analysis indicates proposed modules serve as a stronger verb predictor and a more superior method of utilizing prior knowledge. The codes are available at https://github.com/JacobYuan7/OCN-HOI-Benchmark.

Introduction Video

Sessions where this paper appears

Timezone

Poster Session 5

Red 3

{ "name":"Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics (Poster Session 5)", "description":"", "startDate":"02-25-2022", "endDate":"02-25-2022", "startTime": "16:45", "endTime": "18:30", "location": "Red 3", "timeZone": "US/Pacific", "options":[ "Apple", "Google", "iCal", "Microsoft365", "Outlook.com", "Yahoo" ] }

Poster Session 5
Poster Session 12

Red 3

{ "name":"Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics (Poster Session 12)", "description":"", "startDate":"02-28-2022", "endDate":"02-28-2022", "startTime": "00:45", "endTime": "02:30", "location": "Red 3", "timeZone": "US/Pacific", "options":[ "Apple", "Google", "iCal", "Microsoft365", "Outlook.com", "Yahoo" ] }

Poster Session 12