Visual Consensus Modeling for Video-Text Retrieval

Shuqiang Cao; Bairui Wang; Wei Zhang; Lin Ma

Visual Consensus Modeling for Video-Text Retrieval

Shuqiang Cao, Bairui Wang, Wei Zhang, Lin Ma

[AAAI-22] Main Track

Keywords
Poster Session 4 @ Red 2, Poster Session 11 @ Red 2, Poster Session 4, Poster Session 11

Download Paper

Enter the Virtual Venue

Abstract: In this paper, we propose a novel method to mine the commonsense knowledge shared between the video and text modalities for video-text retrieval, namely visual consensus modeling. Different from the existing works, which learn the video and text representations and their complicated relationships solely based on the pairwise video-text data, we make the first attempt to model the visual consensus by mining the visual concepts from videos and exploiting their co-occurrence patterns within the video and text modalities with no reliance on any additional concept annotations. Specifically, we build a shareable and learnable graph as the visual consensus, where the nodes denoting the mined visual concepts and the edges connecting the nodes representing the co-occurrence relationships between the visual concepts. Extensive experimental results on the public benchmark datasets demonstrate that our proposed method, with the ability to effectively model the visual consensus, achieves the state-of-the-art performances on the bidirectional video-text retrieval task. Our code is available at https://github.com/sqiangcao99/VCM.

Introduction Video

Sessions where this paper appears

Timezone

Poster Session 4

Fri, February 25 5:00 PM - 6:45 PM (+00:00)

Red 2

Add to Calendar
Apple
Google
iCal File
Microsoft 365
Outlook.com
Yahoo

Poster Session 4
Poster Session 11

Mon, February 28 12:45 AM - 2:30 AM (+00:00)

Red 2

Add to Calendar
Apple
Google
iCal File
Microsoft 365
Outlook.com
Yahoo

Poster Session 11