Spatio-Temporal Abstractions in Reinforcement Learning Through Neural Encoding

2017 
Recent progress in the field of Reinforcement Learning (RL) has enabled to tackle bigger and more challenging tasks. However, the increasing complexity of the problems, as well as the use of more sophisticated models such as Deep Neural Networks (DNN), impedes the understanding of artificial agents behavior. In this work, we present the Semi-Aggregated Markov Decision Process (SAMDP) model. The purpose of the SAMDP modeling is to describe and allow a better understanding of complex behaviors by identifying temporal and spatial abstractions. In contrast to other modeling approaches, SAMDP is built in a transformed state-space that encodes the dynamics of the problem. We show that working with the \emph{right} state representation mitigates the problem of finding spatial and temporal abstractions. We describe the process of building the SAMDP model from observed trajectories and give examples for using it in a toy problem and complicated DQN policies. Finally, we show how using the SAMDP we can monitor the policy at hand and make it more robust.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    2
    Citations
    NaN
    KQI
    []