Gradient-Based Versus Gradient-Free Algorithms for Reinforcement Learning

2021 
Despite ongoing improvements, gradient-based reinforcement learning (RL) algorithms involving Neural Networks (NN), remain deficient in reaching the expected behavior by the gradient estimates only. Evolution based approaches can serve as an alternative for training NNs at RL tasks. This paper reports on a comparison of gradient-based Deep Q-Network (DQN) and Double DQN algorithms, with gradient-free (population-based) Genetic Algorithms (GA), on learning to play the Flappy Bird game that involves complex sensory inputs. The results revealed superiority of the GA-based approach and its ability to handle high dimensional search space. This paper reports the time taken for training the agents that deploy such schemes. The observations obtained are the gradient-free GAs required only a few hours to learn to play the game, even without GPUs, whereas DQN and Double DQN entailed more than 100 h, in spite of GPU-assisted training. This demonstrates the inappropriate nature of gradient-based routines for optimization, with the coincidence, enhanced potential of GA-based perspective, augmented by parallelizing and training GPUs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    0
    Citations
    NaN
    KQI
    []