Energy-Based Generative Cooperative Saliency Prediction

Jing Zhang, Jianwen Xie, Zilong Zheng, Nick Barnes

[AAAI-22] Main Track
Abstract: Conventional saliency prediction models typically learn a deterministic mapping from images to the corresponding ground truth saliency maps. In this paper, we study the saliency prediction problem from the perspective of generative models by learning a conditional probability distribution over saliency maps given an image, and treating the saliency prediction as a sampling process. Specifically, we propose a generative cooperative saliency prediction framework, where a conditional latent variable model and a conditional energy-based model are jointly trained to predict saliency in a cooperative manner. The latent variable model serves as a fast but coarse predictor to efficiently produce an initial prediction, which is then refined by the iterative Langevin revision of the energy-based model that serves as a fine predictor. Such a coarse-to-fine cooperative saliency prediction strategy offers the best of both worlds. We further generalize our framework to weakly supervised saliency prediction with a cooperative learning while recovering strategy. Experimental results show that our generative model achieves both state-of-the-art performance and reliable sampling processing exploring.

Introduction Video

Sessions where this paper appears

  • Poster Session 2

    Fri, February 25 12:45 AM - 2:30 AM (+00:00)
    Red 2
    Add to Calendar

  • Poster Session 9

    Sun, February 27 8:45 AM - 10:30 AM (+00:00)
    Red 2
    Add to Calendar

  • Oral Session 9

    Sun, February 27 10:30 AM - 11:45 AM (+00:00)
    Red 2
    Add to Calendar