An Efficient Multi-Agent Q-learning Method Based on Observing the Adversary Agent State Change

2006 
For the task under Markov decision processes, this paper investigates and presents a novel multi-agent reinforcement learning method based on the observing adversary agent state change. By observing the adversary agent state change and taking it as learning agents' observation to the environment, the learning agents extend the learning episodes, and derive more observation by less action. In the extreme, the learning agents can consider the adversary agent state change as their own exploration policy that allows learning agents to use exploitation for deriving maximal reward in the learning processes. Further, by the discussion about that the learning agents' cooperation is done by utilizing the direct communication and the indirect media communication, this paper also gives some descriptions about inexpensive features of both communication methods used in the proposed learning method. The direct communication enhances learning agents' ability of observing the task environment, and the indirect media communication helps learning agents to derive the optimal action policy efficiently. The simulation results on the hunter game demonstrate the efficiency of the proposed method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    4
    Citations
    NaN
    KQI
    []