An Adaptive Learning Rate Q-Learning Algorithm Based on Kalman Filter Inspired by Pigeon Pecking-Color Learning

2020 
The speed and accuracy of the Q-learning algorithm are critically affected by the learning rate. In most Q-learning application, the learning rate is usually set as a constant or decayed in a predetermined way, so it cannot meet the needs of dynamic and rapid learning. In this study, the learning process of pigeon pecking-color task was analyzed. We observed that there was epiphany phenomenon during pigeon’s learning process. The learning rate did not change gradually, but was large in the early stage and disappeared in the middle and late stage. Inspired by these phenomena, an adaptive learning rate Q-learning algorithm based on Kalman filter model (ALR\(_{-}\)KF Q-learning) is proposed in this paper. Q-learning are represented in the framework of Kalman filter model, and the learning rate is equivalent to Kalman gain, which dynamically weighs the fluctuation of environmental reward and the cognitive uncertainty of the agent to the value of \({ }\) pairs. The cognitive uncertainty in the model is determined by the variance of measurement residual and of environmental reward, and is set to zero when it is less than the variance of the environmental reward. The results tested by the two-armed Bandit task showed that the proposed algorithm not only can adaptively learn the statistical characteristics of environmental rewards, but also can quickly and accurately approximate the expected value of \({ }\) pairs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    0
    Citations
    NaN
    KQI
    []