DQN Algorithm Based on Target Value Network Parameter Dynamic Update

2021 
In the deep Q-network of the deep reinforcement learning algorithm, the parameters of the action-value network are copied to the target value network after every $N$ iterations, and the interval $N$ of the update network is unchanged. However, this method simply updates the parameters of the target value network, causing the traditional DQN algorithm to converge slowly and become unstable after convergence. In response to this issue, this paper proposes a DQN algorithm based on target value network parameter dynamic update. Use episode rewards to control the speed of updating the target value network parameters to speed up the algorithm's convergence speed and improve its stability. The TDU-DQN algorithm and the DQN algorithm are applied to Mountain Car and CartPol problems. The experimental results show that the TDU-DQN algorithm has a faster convergence speed and higher stability than the traditional DQN algorithm. It proves the effectiveness of the TDU-DQN algorithm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    0
    Citations
    NaN
    KQI
    []