Pruning Dominated Policies in Multiobjective Pareto Q-Learning

Lawrence Mandow,José-Luis Pérez de-la Cruz

Pruning Dominated Policies in Multiobjective Pareto Q-Learning

2018

Lawrence Mandow
José-Luis Pérez de-la Cruz

The solution for a Multi-Objetive Reinforcement Learning problem is a set of Pareto optimal policies. MPQ-learning is a recent algorithm that approximates the whole set of all Pareto-optimal deterministic policies by directly generalizing Q-learning to the multiobjective setting. In this paper we present a modification of MPQ-learning that avoids useless cyclical policies and thus improves the number of training steps required for convergence.

Keywords:

Convergence (routing)
Reinforcement learning
Pruning
Mathematical optimization
Q-learning
Pareto principle
Generalization
Mathematics
pareto optimal
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations