Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms

Andrew Ilyas,Logan Engstrom,Shibani Santurkar,Dimitris Tsipras,Firdaus Janoos,Larry Rudolph,Aleksander Madry

Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms

2018

Andrew Ilyas
Logan Engstrom
Shibani Santurkar
Dimitris Tsipras
Firdaus Janoos
Larry Rudolph
Aleksander Madry

We study how the behavior of deep policy gradient algorithms reflects the conceptual framework motivating their development. We propose a fine-grained analysis of state-of-the-art methods based on key aspects of this framework: gradient estimation, value prediction, optimization landscapes, and trust region enforcement. We find that from this perspective, the behavior of deep policy gradient algorithms often deviates from what their motivating framework would predict. Our analysis suggests first steps towards solidifying the foundations of these algorithms, and in particular indicates that we may need to move beyond the current benchmark-centric evaluation methodology.

Keywords:

Machine learning
The Conceptual Framework
Trust region
Mathematics
Artificial intelligence
Enforcement
Algorithm
Mathematical optimization
gradient estimation

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations