Multi-Agent Deep Reinforcement Learning for Solving Large-scale Air Traffic Flow Management Problem: A Time-Step Sequential Decision Approach

2021 
In this paper, we focus on the demand-capacity balancing (DCB) problem in air traffic flow management, which is considered as a fully cooperative multi-agent learning task. First, a rule-based time-step environment is designed to mimic the DCB process. In this environment, each agent ‘flight’ decides its action at valid time steps. Three different rules are defined, based on the remaining capacity and the number of cooperative flights in each sector, to ease the learning process. Second, a multi-agent reinforcement learning framework, built on the proximal policy optimization (MAPPO), is proposed by using the parameter sharing mechanism and the mean-field approximation method, where an inherent feature of all other agents is extracted to address the credit assignment problem. Moreover, a supervisor integrated MAPPO framework is proposed, where a supervisor is designed to generate supervised actions, in such a way to further improve the learning performance. In the experiments, two performance indices, Search Capability and Generalization Capability, are considered. Both indices are assessed with the evaluation of two toy cases and a real-world case study. Results suggest that, the supervisor integrated MAPPO with supervised actions achieves the best performance across the different cases; other proposed methods also show some promising Search Capability, but only prove an acceptable Generalization Capability in simpler cases than the training cases.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    0
    Citations
    NaN
    KQI
    []