A Biased-Randomized Learnheuristic for Solving the Team Orienteering Problem with Dynamic Rewards

2020 
Abstract In this paper we discuss the team orienteering problem (TOP) with dynamic inputs. In the static version of the TOP, a fixed reward is obtained after visiting each node. Hence, given a limited fleet of vehicles and a threshold time, the goal is to design the set of routes that maximize the total reward collected. While this static version can be efficiently tackled using a biased-randomized heuristic (BR-H), dealing with the dynamic version requires extending the BR-H into a learnheuristic (BR-LH). With that purpose, a ‘learning’ (white-box) mechanism is incorporated to the heuristic in order to consider the variations in the observed rewards, which follow an unknown (black-box) pattern. In particular, we assume that: (i) each node in the network has a ‘base’ or standard reward value; and (ii) depending on the node’s position inside its route, the actual reward value might differ from the base one according to the aforementioned unknown pattern. As new observations of this black-box pattern are obtained, the white-box mechanism generates better estimates for the actual rewards after each new decision. Accordingly, better solutions can be generated by using this predictive mechanism. Some numerical experiments contribute to illustrate these concepts.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    3
    Citations
    NaN
    KQI
    []