Sub-AVG: Overestimation reduction for cooperative multi-agent reinforcement learning

Haolin Wu,Jianwei Zhang,Zhuang Wang,Yi Lin,Hui Li

Sub-AVG: Overestimation reduction for cooperative multi-agent reinforcement learning

2022

Decomposing the centralized joint action value(JAV) into per-agent individual action value(IAV) is attractive in cooperative multi-agent reinforcement learning(MARL). In such tasks, IAVs based on local observation can perform decentralized policies, and the JAV is used for end-to-end training through traditional reinforcement learning methods, especially through the Q-learning algorithm. However, the Q-learning-based method suffers from overestimation, in which the overestimated action values may result in a suboptimal policy. In this paper, we show that such overestimation can occur in the above Q-learning-based decomposition method. Our solution is Sub-AVG, which utilizes a lower update target by discarding the larger of previously learned IAVs and averaging the retained ones, thus eliminating the excessive overestimation errors. Experiments in the StarCraft Multi-Agent Challenge(SMAC) environment show that Sub-AVG can lead to lower JAV estimations and better-performing policies.

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations