A Priority Experience Replay Sampling Method Based on Upper Confidence Bound

2019 
With the development of deep learning and computer computing ability, the end-to-end learning mechanism of reinforcement learning gradually shows its advantages in control and strategy and other application fields. One of the keys to the success of deep reinforcement learning comes from the establishment of sample experience pool and experience replay algorithm. In the experience replay mechanism of reinforcement learning, the traditional random sampling algorithm is inefficient in learning and fails to make full use of the information of the sample itself. In this paper, a new sample sampling algorithm for experience replay is proposed, which avoids the random sampling in the experience pool of the traditional algorithm and only takes some time to train the samples repeatedly. By integrating the characteristics of UCB algorithm and the diversity of samples required in the sampling process, it is proved that the algorithm can converge to the optimal solution at a fast speed in the continuous motion control grasping experiment of the mechanical arm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    41
    References
    0
    Citations
    NaN
    KQI
    []