SFSRNet: Super-Resolution for Single-Channel Audio Source Separation

Joel Rixen, Matthias Renz

[AAAI-22] Main Track
Abstract: The problem of single-channel audio source separation is to recover (separate) a set of multiple audio sources that are mixed in a single-channel audio signal (e.g. people talking over each other). Some of the best performing single-channel source separation methods utilize downsampling to either make the separation process faster or make the neural networks bigger and increase accuracy. The problem of downsampling is that the upsampling to reconstruct the audio source estimations in the original sampling rate usually comes with information loss. In this paper, we tackle this problem by introducing SFSRNet enclosing a super-resolution (SR) network. The SR network is trained to reconstruct the missing information in higher frequencies of the audio signal by operating on the spectrograms of the output audio source estimations and the input audio mixture. Any separation method where the length of the sequence is a bottleneck in speed and memory can be made faster or more accurate by using the SR network.

Based on the WSJ0-2mix benchmark where estimations of the audio signal of two speakers need to be extracted from the mixture, in our experiments we could show that our proposed SFSRNet reaching a scale-invariant signal-to-noise-ratio improvement (SI-SNRi) of 23.4 dB outperforms the state-of-the-art solution SepFormer reaching an SI-SNRi of 22.3 dB.

Introduction Video

Sessions where this paper appears

  • Poster Session 2

    Fri, February 25 12:45 AM - 2:30 AM (+00:00)
    Red 5
    Add to Calendar

  • Poster Session 7

    Sat, February 26 4:45 PM - 6:30 PM (+00:00)
    Red 5
    Add to Calendar