Increased Reinforcement Learning Performance through Transfer of Representation Learned by State Prediction Model

2021 
Reinforcement Learning (RL) is known for its high sample complexity, particularly in sparse reward settings. This is in part due to biased temporal-difference (TD) targets, particularly in the early stages of training. Our work proposes using state change predictions as an unbiased and non-sparse supplement for TD-targets. By training a forward model which shares a Q-network's initial layers, we allow transfer learning from model dynamics prediction to Q value function approximation. We discuss two variants, one doing this only in the initial steps of the training and another one using it throughout the training process. Both variants can be used as enhancements to state-of-the-art RL algorithms. Our results show that enhancing Double DQN (DDQN) and TD3 with this approach outperforms vanilla versions when applied to the Acrobot, MountainCar, and Cartpole RL benchmarks for DDQN and the HalfCheetah and Walker2D RL benchmarks for TD3. This result is particularly significant as it shows the power of transfer learning even without a priori knowledge.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    7
    References
    0
    Citations
    NaN
    KQI
    []