Asynchronous Methods for Multi-agent Deep Deterministic Policy Gradient

2018 
We propose a variant framework for optimizing the deep neural network controller using asynchronous gradient descent method for the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm. Using CPU’s multicore to create multiple parallel environments, each thread interacts with its own environment replica. Each copy uses prioritized batch data. The evaluation method of Critic was adjusted, and advantage was used as the evaluation of action. The batch data processed by multiple copies is collected and the loss values of each copy are calculated. Using batch data with maximum loss as sampling for global network. In addition, we show the successful application of multi-agent collaboration based on asynchronous methods. The results show that the mean episode reward is higher than the reward obtained by previous algorithm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    0
    Citations
    NaN
    KQI
    []