Bayesian Reinforcement Learning: Real-world learning faster than simulations.

2021 
Deep Reinforcement Learning (DRL) experiments are commonly performed in simulated environments due to the tremendous training sample demands from deep neural networks. In contrast, model-based Bayesian learning allows a robot to learn good policies within a few trials in the real world. Although methods such as Deep PILCO have been applied on many single-robot tasks, here we propose an application of Deep PILCO on finding optimal solutions to the problem of winning a multi-robot combat game. We compare the deep Bayesian learning algorithm with a model-free Deep RL algorithm, Deep Q-Learning, by analyzing the results collected from simulations and real-world experiments. In this game, the RL algorithms' inputs are noisy and unstable due to the filtered LiDAR sensory signal. Surprisingly, our experiments show that the sample-efficient Deep Bayesian RL performance is better than DRL even when comparing the results of a real-world Deep Bayesian RL to those of a simulation-based Deep Q-Learning. Our results point to the advantage of bypassing the reality gap when learning in the real-world with faster learning rates than simulations.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    45
    References
    1
    Citations
    NaN
    KQI
    []