Online Learning of Shaping Reward with Subgoal Knowledge

2021 
SARSA-RS is a reward shaping method that updates the shaping through learning. However, the bottleneck of this method is the aggregation of states since designers need to design mappings from all states to abstract states. We propose a dynamic trajectory aggregation that uses subgoal series. The designer's effort becomes minimal because only human input is the subgoal series. This makes application to environments with high-dimensional observations possible. We compared our method by using participants' subgoal series with a baseline reinforcement learning algorithm and other subgoal-based methods in a navigation task. As a result, our reward shaping outperformed all other methods in learning efficiency.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    0
    Citations
    NaN
    KQI
    []