Adapt to Environment Sudden Changes by Learning a Context Sensitive Policy

Fan-Ming Luo; Shengyi Jiang; Yang Yu; Zongzhang Zhang; Yi-Feng Zhang

Adapt to Environment Sudden Changes by Learning a Context Sensitive Policy

Fan-Ming Luo, Shengyi Jiang, Yang Yu, Zongzhang Zhang, Yi-Feng Zhang

[AAAI-22] Main Track

Keywords
Poster Session 6 @ Blue 1, Poster Session 12 @ Blue 1, Oral Session 12 @ Blue 1, Poster Session 6, Poster Session 12, Oral Session 12

Download Paper

Enter the Virtual Venue

Abstract: Dealing with real-world reinforcement learning (RL) tasks, we shall be aware that the environment may have sudden changes. We expect that a robust policy is able to handle such changes and adapt to the new environment rapidly.

Context-based meta reinforcement learning aims at learning environment adaptable policies. These methods adopt a context encoder to perceive the environment on-the-fly, following which a contextual policy makes environment adaptive decisions according to the context. However, previous methods show lagged and unstable context extraction, which are hard to handle sudden changes well. This paper proposes an environment sensitive contextual policy learning (ESCP) approach, in order to improve both the sensitivity and the robustness of context encoding. ESCP has three key components: \emph{variance minimization} that forces a rapid and stable encoding of the environment context, \emph{relational matrix determinant maximization} that avoids trivial solutions by the variance minimization, and a \emph{history-truncated recurrent neural network} model that avoids old memory interference.

We use a grid-world task and $5$ locomotion controlling tasks with changing parameters to empirically assess our algorithm. Experiment results show that in environments with both in-distribution and out-of-distribution parameter changes, ESCP can not only better recover the environment encoding, but also adapt more rapidly to the post-change environment ($10\times$ faster in the grid-world) while the return performance is kept or improved, compared with state-of-the-art meta RL methods.

Introduction Video

Sessions where this paper appears

Timezone

Poster Session 6

Sat, February 26 8:45 AM - 10:30 AM (+00:00)

Blue 1

Add to Calendar
Apple
Google
iCal File
Microsoft 365
Outlook.com
Yahoo

Poster Session 6
Poster Session 12

Mon, February 28 8:45 AM - 10:30 AM (+00:00)

Blue 1

Add to Calendar
Apple
Google
iCal File
Microsoft 365
Outlook.com
Yahoo

Poster Session 12
Oral Session 12

Mon, February 28 10:30 AM - 11:45 AM (+00:00)

Blue 1

Add to Calendar
Apple
Google
iCal File
Microsoft 365
Outlook.com
Yahoo

Oral Session 12