On the diversity and robustness of parameterised multi-objective test suites

2021 
Abstract The development of optimisation algorithms heavily relies on comparing performance across benchmark problem suites. In continuous unconstrained multi-objective optimisation, the most popular suites are ZDT, DTLZ and WFG. For each problem in these suites, there are construction parameters that control characteristics such as the degree of multimodality or deceptiveness. Despite encouragement from the suites’ authors to do otherwise, experiments are largely performed using only the original values of these parameters. It is important to understand the robustness of these test problems, and their potential to create a diversity of challenging problem landscapes to guide future algorithm testing and development. In this paper we propose a methodology for evaluating robustness of the benchmark test problems by strategically varying construction parameters and exploring how problem difficulty and landscape characteristics are affected. Our methodology adopts both Latin Hyper-cube Sampling and a design and analysis of experiments model to construct more diverse problem instances within the benchmark problem classes. These problem variants are evaluated for eight diverse multi-objective optimisation algorithms to contribute to our understanding of problem robustness. We measure robustness of problems indirectly in terms of impacts on algorithm performance and rankings, and directly in terms of Exploratory landscape Analysis (ELA) metrics that are used to establish problem robustness from a landscape characteristics perspective. Our results show that only eleven of the 21 benchmark problems are robust for algorithms in absolute terms, nine in relative terms, and seven which provide evidence of both types of algorithm robustness. There are also nine problems which satisfy requirements for landscape robustness. Of these, only four of the 21 benchmark problem classes are robust across all measures. These results highlight the importance of diversity in selecting benchmark problems, as the majority of the test suite problems, if only default construction parameters are considered, do not support robust conclusions to be drawn in general about how algorithms perform in the presence of various constructed characteristics intended to challenge algorithms. The existing benchmark test problems are currently insufficient for understanding algorithm performance, certainly with the popularly used default parameters, and more efforts in generating diverse problem instances would serve the research community well.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    56
    References
    1
    Citations
    NaN
    KQI
    []