Learning optimal environments using projected stochastic gradient ascent

Adrien Bolland,Ioannis Boukas,François Cornet,Mathias Berger,Damien Ernst

Learning optimal environments using projected stochastic gradient ascent

2020

Adrien Bolland
Ioannis Boukas
François Cornet
Mathias Berger
Damien Ernst

In this work, we generalize the direct policy search algorithms to an algorithm we call Direct Environment Search with (projected stochastic) Gradient Ascent (DESGA). The latter can be used to jointly learn a reinforcement learning (RL) environment and a policy with maximal expected return over a joint hypothesis space of environments and policies. We illustrate the performance of DESGA on two benchmarks. First, we consider a parametrized space of Mass-Spring-Damper (MSD) environments. Then, we use our algorithm for optimizing the size of the components and the operation of a small-scale and autonomous energy system, i.e. a solar off-grid microgrid, composed of photovoltaic panels, batteries, etc. The results highlight the excellent performances of the DESGA algorithm.

Keywords:

Microgrid
Energy system
Gradient descent
Photovoltaic system
Mathematical optimization
Search algorithm
Expected return
Reinforcement learning
Parametrization
Mathematics

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations