Sub-AVG: Overestimation reduction for cooperative multi-agent reinforcement learning

2022 
Decomposing the centralized joint action value(JAV) into per-agent individual action value(IAV) is attractive in cooperative multi-agent reinforcement learning(MARL). In such tasks, IAVs based on local observation can perform decentralized policies, and the JAV is used for end-to-end training through traditional reinforcement learning methods, especially through the Q-learning algorithm. However, the Q-learning-based method suffers from overestimation, in which the overestimated action values may result in a suboptimal policy. In this paper, we show that such overestimation can occur in the above Q-learning-based decomposition method. Our solution is Sub-AVG, which utilizes a lower update target by discarding the larger of previously learned IAVs and averaging the retained ones, thus eliminating the excessive overestimation errors. Experiments in the StarCraft Multi-Agent Challenge(SMAC) environment show that Sub-AVG can lead to lower JAV estimations and better-performing policies.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []