Model-based Ensemble Reinforcement Learning with Soft Proximal Policy Optimization

2021 
At present, model-free reinforcement learning has been widely used in games, robot control and other fields, and has achieved good results. However, these algorithms require a large number of samples to achieve good performance, which limits the application of model-free reinforcement learning methods in real-world domains. Model-based reinforcement learning can use fewer samples, but requires careful tuning. In this article, we use a neural network ensemble model to learn the dynamics, and incorporate model predictive control as the basic control framework. The dynamic model is also used to train an initial model-free neural network to achieve a combination of sampling efficiency and performance. We evaluated our method on the MuJoCo experimental platform. The results show that, compared with other model-free and model-based methods, our approach achieves better task performance while having excellent sample efficiency.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []