Language Modelling via Learning to Rank

Arvid Frydenlund; Gagandeep Singh; Frank Rudzicz

Language Modelling via Learning to Rank

Arvid Frydenlund, Gagandeep Singh, Frank Rudzicz

[AAAI-22] Main Track

Keywords
Poster Session 1 @ Red 4, Poster Session 8 @ Red 4, Oral Session 8 @ Red 4, Poster Session 1, Poster Session 8, Oral Session 8

Download Paper

Enter the Virtual Venue

Abstract: We consider language modelling (LM) as a multi-label structured prediction task by re-framing training from solely predicting a single ground-truth word to ranking a set of words which could continue a given context. To avoid annotating top-k ranks, we generate them using pre-trained LMs: GPT-2, BERT, and Born-Again models. This leads to a rank-based form of knowledge distillation (KD). We also develop a method using $N$-grams to create a non-probabilistic teacher which generates the ranks without the need of a pre-trained LM.

We confirm the hypotheses: that we can treat LMing as a ranking task and that we can do so without the use of a pre-trained LM.

We show that rank-based KD generally gives a modest improvement to perplexity (PPL) -- though often with statistical significance -- when compared to Kullback–Leibler-based KD. Surprisingly, given the naivety of the method, the $N$-grams act as competitive teachers and achieve similar performance as using either BERT or a Born-Again model teachers. Unsurprisingly, GPT-2 always acts as the best teacher.

Using it and a Transformer-XL student on Wiki-02, rank-based KD reduces a cross-entropy baseline from 65.27 to 55.94 and against a KL-based KD of 56.70.

Introduction Video

Sessions where this paper appears

Timezone

Poster Session 1

Red 4

{ "name":"Language Modelling via Learning to Rank (Poster Session 1)", "description":"", "startDate":"02-24-2022", "endDate":"02-24-2022", "startTime": "08:45", "endTime": "10:30", "location": "Red 4", "timeZone": "US/Pacific", "options":[ "Apple", "Google", "iCal", "Microsoft365", "Outlook.com", "Yahoo" ] }

Poster Session 1
Poster Session 8

Red 4

{ "name":"Language Modelling via Learning to Rank (Poster Session 8)", "description":"", "startDate":"02-26-2022", "endDate":"02-26-2022", "startTime": "16:45", "endTime": "18:30", "location": "Red 4", "timeZone": "US/Pacific", "options":[ "Apple", "Google", "iCal", "Microsoft365", "Outlook.com", "Yahoo" ] }

Poster Session 8
Oral Session 8

Red 4

{ "name":"Language Modelling via Learning to Rank (Oral Session 8)", "description":"", "startDate":"02-26-2022", "endDate":"02-26-2022", "startTime": "18:30", "endTime": "19:45", "location": "Red 4", "timeZone": "US/Pacific", "options":[ "Apple", "Google", "iCal", "Microsoft365", "Outlook.com", "Yahoo" ] }

Oral Session 8