Natural gradient based reinforcement learning algorithm using active stimulating

2012 
Episodic Natural Actor-Critic (eNAC) algorithm is an important direct policy search algorithm which can guarantee the unbiasedness of the natural gradient estimate and have good learning result theoretically. But it has a major drawback: the system reset assumption. A novel algorithm, active stimulating based eNAC (AS-eNAC) algorithm, is proposed to release this restrictive assumption. AS-eNAC algorithm is an extension of eNAC algorithm by introducing an active stimulating procedure into the interaction process to generate the informative episodes automatically. As the initial state of the generated episodes may be different, which violates the prerequisite of the natural gradient estimate method of eNAC algorithm, a linear approximator of the initial state value function is employed in the natural gradient estimate process to improve the accuracy of the estimated natural gradient. Simulation results of the cart-pole balancing demonstrate the efficiency of the proposed algorithm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []