Online Learning of Shaping Reward with Subgoal Knowledge

Takato Okudo,Seiji Yamada

Online Learning of Shaping Reward with Subgoal Knowledge

2021

Takato Okudo
Seiji Yamada

SARSA-RS is a reward shaping method that updates the shaping through learning. However, the bottleneck of this method is the aggregation of states since designers need to design mappings from all states to abstract states. We propose a dynamic trajectory aggregation that uses subgoal series. The designer's effort becomes minimal because only human input is the subgoal series. This makes application to environments with high-dimensional observations possible. We compared our method by using participants' subgoal series with a baseline reinforcement learning algorithm and other subgoal-based methods in a navigation task. As a result, our reward shaping outperformed all other methods in learning efficiency.

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations