Control and Simulation of a 6-DOF Biped Robot based on Twin Delayed Deep Deterministic Policy Gradient Algorithm

Phan Bui Khoi,Nguyen Truong Giang,Hoang Van Tan

Control and Simulation of a 6-DOF Biped Robot based on Twin Delayed Deep Deterministic Policy Gradient Algorithm

2021

Objectives: To study an algorithm to control a bipedal robot to walk so that it has a gait close to that of a human. It is known that the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm is a highly efficient algorithm with a few changes compared to the popular algorithm — the commonly used Deep Deterministic Policy Gradient (DDPG) in the continuous action space problem in Reinforcement Learning. Methods: Different from the usual sparse reward function model used, in this study, a reward model combined with a sparse reward function and dense reward function will be proposed. The application of the TD3 algorithm together with the proposed reward function model to control a bipedal robot model with 6 degrees of freedom will be presented. The training process is simulated in Gazebo/Robot Operating System (ROS) environment. Finding: The results show that, when choosing a reward model combined with a sparse reward function and a dense reward function suitable for the robot model, will help it learn faster and achieve better results. The biped robot can walk straight with an almost human-like gait. In the paper, the results from the TD3 algorithm combined with the proposed reward model are also compared with the results from other algorithms. Novelty: Applying the TD3 algorithm combined with the proposed reward model for the 6-DOF biped robot and simulating the robot’s gait in Gazebo/ROS environment, ROS is a middleware that can be used to control a robot in a real environment in the future. Keywords: TD3; biped robot; reinforcement learning; ROS; Gazebo

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations