Self-guided deep deterministic policy gradient with multi-actor

2021 
Reinforcement learning algorithms have made huge progress in recent years by leveraging the power of deep neural networks. Despite the success, deep reinforcement learning algorithms’ performance is largely dependent on the approach of exploration. Some of them engage in exploratory behavior by injecting external noise into the action space or adopting a gaussian policy. This paper presents a deep reinforcement learning algorithm without external noise called self-guided deep deterministic policy gradient with multi-actor (SDDPGM), which is the combination of deep deterministic policy gradient and generative adversarial networks (GANs). It employs the generator of GANs which trained from excellent experiences to guide the learning of the agent and makes discriminator constitute a subjective reward. Moreover, to make the learning more stable, SDDPGM applies a multi-actor mechanism that stands as a serially distinct actor based on the temporal phase of an episode. Finally, experiments show that SDDPGM is a promising deep reinforcement learning method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    33
    References
    0
    Citations
    NaN
    KQI
    []