Lyapunov Methods for Reinforcement Learnin: A PhD Thesis Proposal Draft

1999 
In this thesis, I propose to investigate a novel approach to injecting prior knowledge to reinforcement learning (RL) systems for minimum cost-to-target control problems. The approach utilizes Lyapunov descent ideas from control theory to constrain the action choices of an RL controller. Such constraints can improve on-line performance and accelerate learning. The RL controller reaches the target much more quickly than an unconstrained system, and learning is focussed on "good" actions - those that lead the system state towards the target. Further, appropriately formulated constraints can provide theoretical guarantees on reaching the target on every trial, and sometimes even worst-case time bounds. These guarantees follow from the action constraints and hold independently of many details of the RL controller, including the method of function approximation. In fact, learning may fail entirely, but a certain level of performance is maintained by the constraints imposed on the controller.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []