Maximum Entropy Inverse Reinforcement Learning Based on Behavior Cloning of Expert Examples

2021 
This study proposes a preprocessing framework for expert examples based on behavior cloning (BC) to solve the problem that inverse reinforcement learning (IRL) is inaccurate due to the noises of expert examples. In order to remove the noises in the expert examples, we first use supervised learning to learn the approximate expert policy, and then use this approximate expert policy to clone new expert examples from the old expert examples, the idea of this preprocessing framework is BC, IRL can obtain higher quality expert examples after preprocessing. The IRL framework adopts the form of maximum entropy, and specific experiments demonstrate the effectiveness of the proposed approach, in the case of expert examples with noises, the reward functions that after BC preprocessing is better than that without preprocessing, especially with the increase of noise level, the effect is particularly obvious.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    0
    Citations
    NaN
    KQI
    []