Pruning Dominated Policies in Multiobjective Pareto Q-Learning
2018
The solution for a Multi-Objetive Reinforcement Learning problem is a set of Pareto optimal policies. MPQ-learning is a recent algorithm that approximates the whole set of all Pareto-optimal deterministic policies by directly generalizing Q-learning to the multiobjective setting. In this paper we present a modification of MPQ-learning that avoids useless cyclical policies and thus improves the number of training steps required for convergence.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
8
References
1
Citations
NaN
KQI