Skip to main content
Top

Exploration in policy optimization through multiple paths

  • 01-10-2021
Published in:

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The article explores the challenge of insufficient exploration in on-policy reinforcement learning algorithms and introduces Multi-Path Policy Optimization (MP-PO), a method that maintains a population of policies to improve exploration. MP-PO enhances the value function estimation by collecting diverse samples and optimizes the picked policy using a shared value network. The method is validated through extensive experiments on MuJoCo tasks, demonstrating significant improvements in sample efficiency and final performance compared to state-of-the-art exploration methods. Additionally, the article discusses the theoretical guarantees and practical benefits of MP-PO, making it a promising approach for enhancing exploration in reinforcement learning.

Not a customer yet? Then find out more about our access models now:

Individual Access

Start your personal individual access now. Get instant access to more than 164,000 books and 540 journals – including PDF downloads and new releases.

Starting from 54,00 € per month!    

Get access

Access for Businesses

Utilise Springer Professional in your company and provide your employees with sound specialist knowledge. Request information about corporate access now.

Find out how Springer Professional can uplift your work!

Contact us now
Title
Exploration in policy optimization through multiple paths
Authors
Ling Pan
Qingpeng Cai
Longbo Huang
Publication date
01-10-2021
Publisher
Springer US
Published in
Autonomous Agents and Multi-Agent Systems / Issue 2/2021
Print ISSN: 1387-2532
Electronic ISSN: 1573-7454
DOI
https://doi.org/10.1007/s10458-021-09518-6
This content is only visible if you are logged in and have the appropriate permissions.
This content is only visible if you are logged in and have the appropriate permissions.

Premium Partner

    Image Credits
    Neuer Inhalt/© ITandMEDIA, Nagarro GmbH/© Nagarro GmbH, AvePoint Deutschland GmbH/© AvePoint Deutschland GmbH, AFB Gemeinnützige GmbH/© AFB Gemeinnützige GmbH, USU GmbH/© USU GmbH, Ferrari electronic AG/© Ferrari electronic AG