Stochastic Subgradient Methods for Dynamic Programming in Continuous State and Action Spaces

2019 
In this paper, we propose a numerical method for dynamic programming in continuous state and action spaces. We first approximate the Bellman operator by using a convex optimization problem, which has many constraints. This convex program is then solved using stochastic subgradient descent. To avoid the full projection onto the high-dimensional feasible set, we develop a novel algorithm that samples, in a coordinated fashion, a mini-batch for a subgradient and another for projection. We show several salient properties of this algorithm, including convergence, and a reduction in the feasibility error and in the variance of the stochastic subgradient.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    3
    Citations
    NaN
    KQI
    []