Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation

Alexander Long; Alan Blair; Herke van Hoof

Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation

Alexander Long, Alan Blair, Herke van Hoof

[AAAI-22] Main Track

Keywords
Poster Session 6 @ Blue 1, Poster Session 12 @ Blue 1, Oral Session 6 @ Blue 1, Poster Session 6, Poster Session 12, Oral Session 6

Download Paper

Enter the Virtual Venue

Abstract: We present Nonparametric Approximation of Inter-Trace returns (NAIT), a Reinforcement Learning algorithm for discrete action, pixel-based environments that is both highly sample and computation efficient. NAIT is a lazy-learning approach with an update that is equivalent to episodic Monte-Carlo on episode completion, but that allows the stable incorporation of rewards while an episode is ongoing. We make use of a fixed domain-agnostic representation, simple distance based exploration and a proximity graph-based lookup to facilitate extremely fast execution. We empirically evaluate NAIT on both the 26 and 57 game variants of ATARI100k where, despite its simplicity, it achieves competitive performance in the online setting with greater than 100x speedup in wall-time.

Introduction Video

Sessions where this paper appears

Timezone

Poster Session 6

Blue 1

{ "name":"Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation (Poster Session 6)", "description":"", "startDate":"02-26-2022", "endDate":"02-26-2022", "startTime": "00:45", "endTime": "02:30", "location": "Blue 1", "timeZone": "US/Pacific", "options":[ "Apple", "Google", "iCal", "Microsoft365", "Outlook.com", "Yahoo" ] }

Poster Session 6
Poster Session 12

Blue 1

{ "name":"Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation (Poster Session 12)", "description":"", "startDate":"02-28-2022", "endDate":"02-28-2022", "startTime": "00:45", "endTime": "02:30", "location": "Blue 1", "timeZone": "US/Pacific", "options":[ "Apple", "Google", "iCal", "Microsoft365", "Outlook.com", "Yahoo" ] }

Poster Session 12
Oral Session 6

Blue 1

{ "name":"Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation (Oral Session 6)", "description":"", "startDate":"02-26-2022", "endDate":"02-26-2022", "startTime": "02:30", "endTime": "03:45", "location": "Blue 1", "timeZone": "US/Pacific", "options":[ "Apple", "Google", "iCal", "Microsoft365", "Outlook.com", "Yahoo" ] }

Oral Session 6