Data-efficient Deep Reinforcement Learning Method Toward Scaling Continuous Robotic Task with Sparse Rewards

2021 
Dealing with the robotic continuous control problem with sparse rewards is a longstanding challenge in deep reinforcement learning (RL). While existing DRL algorithms have demonstrated great progress in learning policies from visual observations, learning effective policies still requires an impractical number of real-world data samples. Moreover, some robotic tasks are naturally specified with sparse rewards, which makes the precious data inefficient and slows down the learning process, making DRL infeasible. In addition, manually shaping reward functions is a complex work because it needs specific domain knowledge and human intervention. To alleviate the issue, this paper proposes a model-free, off-policy RL approach named TD3MHER, to learn the manipulating policy for continuous robotic tasks with sparse rewards. To be specific, TD3MHER utilizes Twin Delayed Deep Deterministic policy gradient algorithm (TD3) and Model-driven Hindsight Experience Replay (MHER) to achieve highly sample-efficient training property. Because while the agent is learning the policy, TD3MHER could also help it to learn the potation physical model of the robot which is helpful to solve the task, and it does not necessitate any novel robot-environment interactions. The performance of TD3MHER is assessed on a simulated robotic task using a 7-DOF manipulator to compare the proposed technique to a previous DRL algorithm and to verify the usefulness of our method. Results of the experiments on simulated robotic task show that the proposed approach is capable of successfully utilizing previously store samples with sparse rewards, and obtain a faster learning speed.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    0
    Citations
    NaN
    KQI
    []