State Representation Learning for Minimax Deep Deterministic Policy Gradient

2019 
Recently, the reinforcement learning of multi-agent has been developed rapidly, especially the Minimax Deep Deterministic Policy Gradient (M3DDPG) algorithm which improves agent robustness and solves the problem that agents trained by deep reinforcement learning (DRL) are often vulnerable and sensitive to the training environment. However, agents in the real environment may not be able to perceive certain important characteristics of the environment because of their limited perceptual capabilities. So Agents often fail to achieve the desired results. In this paper, we propose a novel algorithm State Representation Learning for Minimax Deep Deterministic Policy Gradient (SRL_M3DDPG) that combines M3DDPG with the state representation learning neural network model to extract the important characteristics of raw data. And we optimize the actor and critic network by using the neural network model of state representation learning. Then the actor and critic network learn from the state representation model instead of the raw observations. Simulation experiments show that the algorithm improves the final result.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []