Prudent Policy Gradient with Auxiliary Actor in Multi-degree-of-freedom Robotic Tasks

Li Xiangjian,Huashan Liu,Xin Cheng,Menghua Dong

Prudent Policy Gradient with Auxiliary Actor in Multi-degree-of-freedom Robotic Tasks

2021

The overestimation bias caused by the function approximation error is a common problem of the value-based reinforcement learning algorithms. A clipped Double Q-learning method and delayed policy updates are adopted by Twin Delayed Deep Deterministic policy gradient(TD3) algorithm to reduce the impact of this problem. Although TD3 brings some feasibility, the problem still has not been solved ideally. Thus, based on TD3 an novel algorithm named as Prudent Policy Gradient(PPG) is proposed, where an auxiliary actor is used to prevent actor from selecting exceeding actions and makes the agent’s behavior more prudent. This allows the proposed PPG to find a more efficient and stable policy. The experimental results illustrate that the proposed PPG outperforms TD3 in robotic tasks of several MuJoCo benchmarks and path explorations.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations