Reward-Weighted Regression Converges to a Global Optimum

Miroslav Strupl; Francesco Faccio; Dylan R. Ashley; Rupesh Kumar Srivastava; Jürgen Schmidhuber

Reward-Weighted Regression Converges to a Global Optimum

Miroslav Strupl, Francesco Faccio, Dylan R. Ashley, Rupesh Kumar Srivastava, Jürgen Schmidhuber

[AAAI-22] Main Track

Keywords
Poster Session 3 @ Blue 1, Poster Session 7 @ Blue 1, Poster Session 3, Poster Session 7

Download Paper

Enter the Virtual Venue

Abstract: Reward-Weighted Regression (RWR) belongs to a family of widely known iterative Reinforcement Learning algorithms based on the Expectation-Maximization framework. In this family, learning at each iteration consists of sampling a batch of trajectories using the current policy and fitting a new policy to maximize a return-weighted log-likelihood of actions. Although RWR is known to yield monotonic improvement of the policy under certain circumstances, whether and under which conditions RWR converges to the optimal policy have remained open questions. In this paper, we provide for the first time a proof that RWR converges to a global optimum when no function approximation is used, in a general compact setting. Furthermore, for the simpler case with finite state and action spaces we prove R-linear convergence of the state-value function to the optimum.

Introduction Video

Sessions where this paper appears

Timezone

Poster Session 3

Fri, February 25 8:45 AM - 10:30 AM (+00:00)

Blue 1

Add to Calendar
Apple
Google
iCal File
Microsoft 365
Outlook.com
Yahoo

Poster Session 3
Poster Session 7

Sat, February 26 4:45 PM - 6:30 PM (+00:00)

Blue 1

Add to Calendar
Apple
Google
iCal File
Microsoft 365
Outlook.com
Yahoo

Poster Session 7