Inducing Cooperation through Reward Reshaping based on Peer Evaluations in Deep Multi-Agent Reinforcement Learning.

2020 
We propose a deep reinforcement learning algorithm for semi-cooperative multi-agent tasks, where agents are equipped with their separate reward functions, yet with some willingness to cooperate. It is intuitive that defining and directly maximizing a global reward function leads to cooperation because there is no concept of selfishness among agents. However, it may not be the best way of inducing such cooperation due to problems that arise from training multiple agents with a single reward (e.g., credit assignment). In addition, agents may intentionally be given separate reward functions to induce task prioritization whereas a global reward function may be difficult to define without diluting the effect of different tasks and causing their reward factors to be disregarded. Our algorithm, called Peer Evaluation-based Dual DQN (PED-DQN), proposes to give peer evaluation signals to observed agents, which quantify how they strategically value a certain transition. This exchange of peer evaluation among agents over time turns out to render agents to gradually reshape their reward functions so that their action choices from the myopic best response tend to result in a more cooperative joint action.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    3
    Citations
    NaN
    KQI
    []