Selective Transition Collection in Experience Replay

2021 
Experience replay method is often used in off-policy reinforcement learning. As the training progresses, the distribution of the collected transitions becomes more and more concentrated, and this will lead to catastrophic forgetting and a low rate of convergence. In this paper, we present selective transition collection algorithm which is a new design to address the concentrated distribution by selectively collection the transitions. We propose a method to estimate the similarity between transitions, and a probability function to reduce the chance of transitions with high similarity to the experience memory being collected. We test our method on familiar reinforcement learning tasks and the experimental results demonstrate that selective transition collection can not only speed up the learning but also prevent catastrophic forgetting effectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    0
    Citations
    NaN
    KQI
    []