Minimizing the Cost of Spatiotemporal Searches Based on Reinforcement Learning with Probabilistic States

2022 
Portraying the trajectories of certain vehicles effectively is of great significance for urban public safety. Specially, we aim to determine the location of a vehicle at a specific past moment. In some situations, the waypoints of the vehicle’s trajectory are not directly available, but the vehicle’s image may be contained in massive camera video records. Since these records are only indexed by location and moment, rather than by contents such as license plate numbers, finding the vehicle from these records is a time-consuming task. To minimize the cost of spatiotemporal search (a spatiotemporal search means the effort to check whether the vehicle appears at a specified location at a specified moment), this paper proposes a reinforcement learning algorithm called Quasi-Dynamic Programming (QDP), which is an improved Q-learning. QDP selects the searching moment iteratively based on known past locations, considering both the cost efficiency of the current action and its potential impact on subsequent actions. Unlike traditional Q-learning, QDP has probabilistic states during training. To address the problem of probabilistic states, we make the following contributions: 1) replaces the next state by multiple states of a probability distribution; 2) estimates the expected cost of subsequent actions to calculate the value function; 3) creates a state and an action randomly in each loop to train the value function progressively. Finally, experiments are conducted using real-world vehicle trajectories, and the results show that the proposed QDP is superior to the previous greedy-based algorithms and other baselines.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []