Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win

Utku Evci; Yann Dauphin; Yani Ioannou; Cem Keskin

Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win

Utku Evci, Yann Dauphin, Yani Ioannou, Cem Keskin

[AAAI-22] Main Track

Keywords
Poster Session 1 @ Blue 2, Poster Session 11 @ Blue 2, Oral Session 1 @ Blue 2, Poster Session 1, Poster Session 11, Oral Session 1

Download Paper

Enter the Virtual Venue

Abstract: Sparse Neural Networks (NNs) can match the generalization of dense NNs using a

fraction of the compute/storage for inference, and have the potential to enable efficient training. However, naively training unstructured sparse NNs from random initialization results in significantly worse generalization, with the notable exceptions of Lottery Tickets (LTs) and Dynamic Sparse Training (DST). In this work, we attempt to answer: (1) why training unstructured sparse networks from random initialization performs poorly and; (2) what makes LTs and DST the exceptions? We show that sparse NNs have poor gradient flow at initialization and propose a modified initialization for unstructured connectivity. Furthermore, we find that DST methods significantly improve gradient flow during training over traditional sparse training methods. Finally, we show that LTs do not improve gradient flow, rather their success lies in re-learning the pruning solution they are derived from — however, this comes at the cost of learning novel solutions.

Introduction Video

Sessions where this paper appears

Timezone

Poster Session 1

Blue 2

{ "name":"Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win (Poster Session 1)", "description":"", "startDate":"02-24-2022", "endDate":"02-24-2022", "startTime": "08:45", "endTime": "10:30", "location": "Blue 2", "timeZone": "US/Pacific", "options":[ "Apple", "Google", "iCal", "Microsoft365", "Outlook.com", "Yahoo" ] }

Poster Session 1
Poster Session 11

Blue 2

{ "name":"Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win (Poster Session 11)", "description":"", "startDate":"02-27-2022", "endDate":"02-27-2022", "startTime": "16:45", "endTime": "18:30", "location": "Blue 2", "timeZone": "US/Pacific", "options":[ "Apple", "Google", "iCal", "Microsoft365", "Outlook.com", "Yahoo" ] }

Poster Session 11
Oral Session 1

Blue 2

{ "name":"Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win (Oral Session 1)", "description":"", "startDate":"02-24-2022", "endDate":"02-24-2022", "startTime": "10:30", "endTime": "11:45", "location": "Blue 2", "timeZone": "US/Pacific", "options":[ "Apple", "Google", "iCal", "Microsoft365", "Outlook.com", "Yahoo" ] }

Oral Session 1