Sparse Reward Based Manipulator Motion Planning by Using High Speed Learning from Demonstrations

2018 
This paper proposed a high speed learning from demonstrations (LfD) method for sparse reward based motion planning problem of manipulator by using hindsight experience replay (HER) mechanism and deep deterministic policy gradient (DDPG) method. First, a demonstrations replay buffer and an agent exploration replay buffer are created for storing experience data, and the hindsight experience replay mechanism is subsequently used to acquire the experience data from the two replay buffers. Then, the deep deterministic policy gradient method is used to learn the experience data and finally fulfil the manipulator motion planning tasks under the sparse reward. Last, experiments on the pushing and pick-and-place tasks were conducted in the robotics environment in the gym. Results show that the training speed is increased to at least 10 times as compared to the deep deterministic policy gradient method without demonstrations data. In addition, the proposed method can effectively utilize the sparse reward, and the agent can quickly complete the task even under the low success rate of demonstrations data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []