SepFusion: Finding Optimal Fusion Structures for Visual Sound Separation

Dongzhan Zhou; Xinchi Zhou; Di Hu; Hang Zhou; Lei Bai; Ziwei Liu; Wanli Ouyang

SepFusion: Finding Optimal Fusion Structures for Visual Sound Separation

Dongzhan Zhou, Xinchi Zhou, Di Hu, Hang Zhou, Lei Bai, Ziwei Liu, Wanli Ouyang

[AAAI-22] Main Track

Keywords
Poster Session 5 @ Red 3, Poster Session 12 @ Red 3, Poster Session 5, Poster Session 12

Download Paper

Enter the Virtual Venue

Abstract: Multiple modalities can provide rich semantic information; and exploiting such information will normally lead to better performance compared with the single-modality counterpart.

However, it is not easy to devise an effective cross-modal fusion structure due to the variations of feature dimensions and semantics, especially when the inputs even come from different sensors, as in the field of audio-visual learning. In this work, we propose SepFusion, a novel framework that can smoothly produce optimal fusion structures for visual-sound separation. The framework is composed of two components, namely the model generator and the evaluator. To construct the generator, we devise a lightweight architecture space that can adapt to different input modalities. In this way, we can easily obtain audio-visual fusion structures according to our demands. For the evaluator, we adopt the idea of neural architecture search to select superior networks effectively. This automatic process can significantly save human efforts while achieving competitive performances. Moreover, since our SepFusion provides a series of strong models, we can utilize the model family for broader applications, such as further promoting performance via model assembly, or providing suitable architectures for the separation of certain instrument classes. These potential applications further enhance the competitiveness of our approach.

Introduction Video

Sessions where this paper appears

Timezone

Poster Session 5

Sat, February 26 12:45 AM - 2:30 AM (+00:00)

Red 3

Add to Calendar
Apple
Google
iCal File
Microsoft 365
Outlook.com
Yahoo

Poster Session 5
Poster Session 12

Mon, February 28 8:45 AM - 10:30 AM (+00:00)

Red 3

Add to Calendar
Apple
Google
iCal File
Microsoft 365
Outlook.com
Yahoo

Poster Session 12