Up to 100$\times$ Faster Data-Free Knowledge Distillation

Gongfan Fang, Kanya Mo, Xinchao Wang, Jie Song, Shitao Bei, Haofei Zhang, Mingli Song

[AAAI-22] Main Track
Abstract: Data-free knowledge distillation (DFKD) has recently been attracting increasing attention from research communities, attributed to its capability of compressing a model only using synthetic data. Despite the encouraging results achieved, state-of-the-art DFKD methods still suffer from the inefficiency of data synthesis, making the data-free training process extremely time-consuming and thus inapplicable for large-scale tasks. In this work, we introduce an efficacious scheme, termed as FastDFKD, that allows us to accelerate DFKD by a factor of orders of magnitude. At the heart of our approach is a novel strategy to reuse the shared common features in training data so as to synthesize different data instances. Unlike prior methods that optimize a set of data independently, we propose to learn a meta-synthesizer that seeks common features as the initialization for the fast data synthesis. As a result, FastDFKD achieves data synthesis within only a few steps, significantly enhancing the efficiency of data-free training. Experiments over CIFAR, NYUv2 and ImageNet demonstrate that the proposed FastDFKD achieves 10$\times$ and even 100$\times$ acceleration while preserving performances on par with the state of the art.

Introduction Video

Sessions where this paper appears

  • Poster Session 3

    Fri, February 25 8:45 AM - 10:30 AM (+00:00)
    Blue 3
    Add to Calendar

  • Poster Session 8

    Sun, February 27 12:45 AM - 2:30 AM (+00:00)
    Blue 3
    Add to Calendar