Policy Optimization with Stochastic Mirror Descent

Long Yang; Yu Zhang; Gang Zheng; Qian Zheng; Pengfei Li; Jianhang Huang; Gang Pan

Policy Optimization with Stochastic Mirror Descent

Long Yang, Yu Zhang, Gang Zheng, Qian Zheng, Pengfei Li, Jianhang Huang, Gang Pan

[AAAI-22] Main Track

Keywords
Poster Session 6 @ Blue 1, Poster Session 12 @ Blue 1, Poster Session 6, Poster Session 12

Download Paper

Enter the Virtual Venue

Abstract: Improving sample efficiency has been a longstanding goal in reinforcement learning.

This paper proposes the $\mathtt{VRMPO}$ algorithm: a sample efficient policy gradient method with stochastic mirror descent.

In $\mathtt{VRMPO}$, a novel variance-reduced policy gradient estimator is presented to improve sample efficiency.

Furthermore, we prove that our $\mathtt{VRMPO}$ needs only $\mathcal{O}(\epsilon^{-3})$ sample trajectories to achieve an $\epsilon$-approximate first-order stationary point,

which matches the best sample complexity.

The extensive experimental results demonstrate that our algorithm outperforms the state-of-the-art policy gradient methods in various settings.

Introduction Video

Sessions where this paper appears

Timezone

Poster Session 6

Sat, February 26 8:45 AM - 10:30 AM (+00:00)

Blue 1

Add to Calendar
Apple
Google
iCal File
Microsoft 365
Outlook.com
Yahoo

Poster Session 6
Poster Session 12

Mon, February 28 8:45 AM - 10:30 AM (+00:00)

Blue 1

Add to Calendar
Apple
Google
iCal File
Microsoft 365
Outlook.com
Yahoo

Poster Session 12