End-to-end robot manipulation using demonstration-guided goal strategie.

2019 
In deep reinforcement learning, finding the optimal manipulation policy of a multi–DOF manipulator in 3D space requires intricate reward shaping for the agent to find the optimal policy. However, reward shaping requires cumbersome optimization of the reward function based on prior knowledge on robotic tasks to achieve. This makes it desirable to learn various manipulation policies with a simple reward function.In this study, we propose a method that learns the manipulation policy of a manipulator in a sparse reward setting. To this end, Hindsight Experience Replay (HER) is combined with Twin Delayed DDPG (TD3) by applying the goal strategy that incorporates demonstrations for the policy. It is shown that the policy can estimate the joint control command of a 7-DoF manipulator from raw RGB video inputs in sparse reward setting in an end-to-end manner.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    0
    Citations
    NaN
    KQI
    []