Delving into Sample Loss Curve to Embrace Noisy and Imbalanced Data

Shenwang Jiang; Jianan Li; Ying Wang; Tingfa Xu; Bo Huang; zhang zhang

Delving into Sample Loss Curve to Embrace Noisy and Imbalanced Data

Shenwang Jiang, Jianan Li, Ying Wang, Tingfa Xu, Bo Huang, zhang zhang

[AAAI-22] Main Track

Keywords
Poster Session 2 @ Blue 2, Poster Session 9 @ Blue 2, Poster Session 2, Poster Session 9

Download Paper

Enter the Virtual Venue

Abstract: Corrupted labels and class imbalance are commonly encountered in practically collected training data, which easily leads to over-fitting of deep neural networks (DNNs). Existing approaches alleviate these issues by adopting a sample re-weighting strategy, which are designed weighting function mapping training loss to sample weight. However, it is only applicable for training data containing only either one type of data biases.

In practice, however, biased samples with corrupted labels and of tailed classes commonly co-exist in training data, how to handle them simultaneously is a key but under-explored problem. In this paper, we found that these two types of biased samples, though have similar transient loss, have distinguishable trend and characteristics in loss curves, which could provide valuable priors for sample weight assignment. Motivated by this, we delve into the loss curves and propose a novel probe-and-allocate training strategy: In the probing stage, we train the network on the whole biased training data without intervention, and record the loss curve of each sample as an additional attribute; In the allocating stage, we feed the resulting attribute to a newly designed curve-perception network, named CurveNet, to learn to identify the bias type of each sample and assign proper weights through meta-learning adaptively. To the best of our knowledge, we are the first to simultaneously handle corrupted labels and class imbalance existing in training data. Extensive synthetic and real experiments well validate the proposed method, which achieves state-of-the-art performance on multiple challenging benchmarks.

Introduction Video

Sessions where this paper appears

Timezone

Poster Session 2

Fri, February 25 12:45 AM - 2:30 AM (+00:00)

Blue 2

Add to Calendar
Apple
Google
iCal File
Microsoft 365
Outlook.com
Yahoo

Poster Session 2
Poster Session 9

Sun, February 27 8:45 AM - 10:30 AM (+00:00)

Blue 2

Add to Calendar
Apple
Google
iCal File
Microsoft 365
Outlook.com
Yahoo

Poster Session 9