Differentiable MPC for End-to-end Planning and Control

2018 
In this paper we present foundations for using model predictive control (MPC) as a differentiable policy class in reinforcement learning. Specifically, we differentiate through MPC by using the KKT conditions of the convex approximation at a fixed point of the solver. Using this strategy, we are able to learn the cost and dynamics of a controller via end-to-end learning in a larger system. We empirically show results in an imitation learning setting, demonstrating that we can recover the underlying dynamics and cost more efficiently and reliably than with a generic neural network policy class.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    74
    References
    52
    Citations
    NaN
    KQI
    []